MATLAB Statistics and Machine Learning Toolbox™ User's Guide

Table of contents :
Getting Started
Statistics and Machine Learning Toolbox Product Description
Key Features
Supported Data Types
Organizing Data
Other MATLAB Functions Supporting Nominal and Ordinal Arrays
Create Nominal and Ordinal Arrays
Create Nominal Arrays
Create Ordinal Arrays
Change Category Labels
Change Category Labels
Reorder Category Levels
Reorder Category Levels in Ordinal Arrays
Reorder Category Levels in Nominal Arrays
Categorize Numeric Data
Categorize Numeric Data
Merge Category Levels
Merge Category Levels
Add and Drop Category Levels
Plot Data Grouped by Category
Plot Data Grouped by Category
Test Differences Between Category Means
Summary Statistics Grouped by Category
Summary Statistics Grouped by Category
Sort Ordinal Arrays
Sort Ordinal Arrays
Nominal and Ordinal Arrays
What Are Nominal and Ordinal Arrays?
Nominal and Ordinal Array Conversion
Advantages of Using Nominal and Ordinal Arrays
Manipulate Category Levels
Analysis Using Nominal and Ordinal Arrays
Reduce Memory Requirements
Index and Search Using Nominal and Ordinal Arrays
Index By Category
Common Indexing and Searching Methods
Grouping Variables
What Are Grouping Variables?
Group Definition
Analysis Using Grouping Variables
Missing Group Values
Dummy Variables
What Are Dummy Variables?
Creating Dummy Variables
Linear Regression with Categorical Covariates
Create a Dataset Array from Workspace Variables
Create a Dataset Array from a Numeric Array
Create Dataset Array from Heterogeneous Workspace Variables
Create a Dataset Array from a File
Create a Dataset Array from a Tab-Delimited Text File
Create a Dataset Array from a Comma-Separated Text File
Create a Dataset Array from an Excel File
Add and Delete Observations
Add and Delete Variables
Access Data in Dataset Array Variables
Select Subsets of Observations
Sort Observations in Dataset Arrays
Merge Dataset Arrays
Stack or Unstack Dataset Arrays
Calculations on Dataset Arrays
Export Dataset Arrays
Clean Messy and Missing Data
Dataset Arrays in the Variables Editor
Open Dataset Arrays in the Variables Editor
Modify Variable and Observation Names
Reorder or Delete Variables
Add New Data
Sort Observations
Select a Subset of Data
Create Plots
Dataset Arrays
What Are Dataset Arrays?
Dataset Array Conversion
Dataset Array Properties
Index and Search Dataset Arrays
Ways To Index and Search
Examples
Descriptive Statistics
Measures of Central Tendency
Measures of Central Tendency
Measures of Dispersion
Compare Measures of Dispersion
Quantiles and Percentiles
Exploratory Analysis of Data
Resampling Statistics
Bootstrap Resampling
Jackknife Resampling
Parallel Computing Support for Resampling Methods
Data with Missing Values
Working with Data with Missing Values
Statistical Visualization
Create Scatter Plots Using Grouped Data
Box Plots
Compare Grouped Data Using Box Plots
Distribution Plots
Normal Probability Plots
Probability Plots
Quantile-Quantile Plots
Cumulative Distribution Plots
Probability Distributions
Working with Probability Distributions
Probability Distribution Objects
Probability Distribution Functions
Probability Distribution Apps and User Interfaces
Supported Distributions
Continuous Distributions (Data)
Continuous Distributions (Statistics)
Discrete Distributions
Multivariate Distributions
Nonparametric Distributions
Flexible Distribution Families
Maximum Likelihood Estimation
Negative Loglikelihood Functions
Find MLEs Using Negative Loglikelihood Function
Random Number Generation
Nonparametric and Empirical Probability Distributions
Overview
Kernel Distribution
Empirical Cumulative Distribution Function
Piecewise Linear Distribution
Pareto Tails
Triangular Distribution
Fit Kernel Distribution Object to Data
Fit Kernel Distribution Using ksdensity
Fit Distributions to Grouped Data Using ksdensity
Fit a Nonparametric Distribution with Pareto Tails
Generate Random Numbers Using the Triangular Distribution
Model Data Using the Distribution Fitter App
Explore Probability Distributions Interactively
Create and Manage Data Sets
Create a New Fit
Display Results
Manage Fits
Evaluate Fits
Exclude Data
Save and Load Sessions
Generate a File to Fit and Plot Distributions
Fit a Distribution Using the Distribution Fitter App
Step 1: Load Sample Data
Step 2: Import Data
Step 3: Create a New Fit
Step 4: Create and Manage Additional Fits
Define Custom Distributions Using the Distribution Fitter App
Open the Distribution Fitter App
Define Custom Distribution
Import Custom Distribution
Explore the Random Number Generation UI
Compare Multiple Distribution Fits
Fit Probability Distribution Objects to Grouped Data
Multinomial Probability Distribution Objects
Multinomial Probability Distribution Functions
Generate Random Numbers Using Uniform Distribution Inversion
Represent Cauchy Distribution Using t Location-Scale
Generate Cauchy Random Numbers Using Student's t
Generate Correlated Data Using Rank Correlation
Create Gaussian Mixture Model
Fit Gaussian Mixture Model to Data
Simulate Data from Gaussian Mixture Model
Copulas: Generate Correlated Samples
Determining Dependence Between Simulation Inputs
Constructing Dependent Bivariate Distributions
Using Rank Correlation Coefficients
Using Bivariate Copulas
Higher Dimension Copulas
Archimedean Copulas
Simulating Dependent Multivariate Data Using Copulas
Fitting Copulas to Data
Gaussian Processes
Gaussian Process Regression Models
Kernel (Covariance) Function Options
Exact GPR Method
Parameter Estimation
Prediction
Computational Complexity of Exact Parameter Estimation and Prediction
Subset of Data Approximation for GPR Models
Subset of Regressors Approximation for GPR Models
Approximating the Kernel Function
Parameter Estimation
Prediction
Predictive Variance Problem
Fully Independent Conditional Approximation for GPR Models
Approximating the Kernel Function
Parameter Estimation
Prediction
Block Coordinate Descent Approximation for GPR Models
Fit GPR Models Using BCD Approximation
Random Number Generation
Generating Pseudorandom Numbers
Common Pseudorandom Number Generation Methods
Representing Sampling Distributions Using Markov Chain Samplers
Using the Metropolis-Hastings Algorithm
Using Slice Sampling
Using Hamiltonian Monte Carlo
Generating Quasi-Random Numbers
Quasi-Random Sequences
Quasi-Random Point Sets
Quasi-Random Streams
Generating Data Using Flexible Families of Distributions
Bayesian Linear Regression Using Hamiltonian Monte Carlo
Hypothesis Tests
Hypothesis Test Terminology
Hypothesis Test Assumptions
Hypothesis Testing
Available Hypothesis Tests
Analysis of Variance
Introduction to Analysis of Variance
One-Way ANOVA
Introduction to One-Way ANOVA
Prepare Data for One-Way ANOVA
Perform One-Way ANOVA
Mathematical Details
Two-Way ANOVA
Introduction to Two-Way ANOVA
Prepare Data for Balanced Two-Way ANOVA
Perform Two-Way ANOVA
Mathematical Details
Multiple Comparisons
Introduction
Multiple Comparisons Using One-Way ANOVA
Multiple Comparisons for Three-Way ANOVA
Multiple Comparison Procedures
N-Way ANOVA
Introduction to N-Way ANOVA
Prepare Data for N-Way ANOVA
Perform N-Way ANOVA
ANOVA with Random Effects
Other ANOVA Models
Analysis of Covariance
Introduction to Analysis of Covariance
Analysis of Covariance Tool
Confidence Bounds
Multiple Comparisons
Nonparametric Methods
Introduction to Nonparametric Methods
Kruskal-Wallis Test
Friedman's Test
MANOVA
Introduction to MANOVA
ANOVA with Multiple Responses
Model Specification for Repeated Measures Models
Wilkinson Notation
Compound Symmetry Assumption and Epsilon Corrections
Mauchly’s Test of Sphericity
Multivariate Analysis of Variance for Repeated Measures
Bayesian Optimization
Bayesian Optimization Algorithm
Algorithm Outline
Gaussian Process Regression for Fitting the Model
Acquisition Function Types
Acquisition Function Maximization
Parallel Bayesian Optimization
Optimize in Parallel
Parallel Bayesian Algorithm
Settings for Best Parallel Performance
Differences in Parallel Bayesian Optimization Output
Bayesian Optimization Plot Functions
Built-In Plot Functions
Custom Plot Function Syntax
Create a Custom Plot Function
Bayesian Optimization Output Functions
What Is a Bayesian Optimization Output Function?
Built-In Output Functions
Custom Output Functions
Bayesian Optimization Output Function
Bayesian Optimization Workflow
What Is Bayesian Optimization?
Ways to Perform Bayesian Optimization
Bayesian Optimization Using a Fit Function
Bayesian Optimization Using bayesopt
Bayesian Optimization Characteristics
Parameters Available for Fit Functions
Hyperparameter Optimization Options for Fit Functions
Variables for a Bayesian Optimization
Syntax for Creating Optimization Variables
Variables for Optimization Examples
Bayesian Optimization Objective Functions
Objective Function Syntax
Objective Function Example
Objective Function Errors
Constraints in Bayesian Optimization
Bounds
Deterministic Constraints — XConstraintFcn
Conditional Constraints — ConditionalVariableFcn
Coupled Constraints
Bayesian Optimization with Coupled Constraints
Optimize a Cross-Validated SVM Classifier Using bayesopt
Optimize an SVM Classifier Fit Using Bayesian Optimization
Optimize a Boosted Regression Ensemble
Parametric Regression Analysis
Choose a Regression Function
Update Legacy Code with New Fitting Methods
What Is a Linear Regression Model?
Linear Regression
Prepare Data
Choose a Fitting Method
Choose a Model or Range of Models
Fit Model to Data
Examine Quality and Adjust Fitted Model
Predict or Simulate Responses to New Data
Share Fitted Models
Linear Regression Workflow
Regression Using Dataset Arrays
Linear Regression Using Tables
Linear Regression with Interaction Effects
Interpret Linear Regression Results
Cook’s Distance
Purpose
Definition
How To
Determine Outliers Using Cook's Distance
Coefficient Standard Errors and Confidence Intervals
Coefficient Covariance and Standard Errors
Coefficient Confidence Intervals
Coefficient of Determination (R-Squared)
Purpose
Definition
How To
Display Coefficient of Determination
Delete-1 Statistics
Delete-1 Change in Covariance (covratio)
Delete-1 Scaled Difference in Coefficient Estimates (Dfbetas)
Delete-1 Scaled Change in Fitted Values (Dffits)
Delete-1 Variance (S2_i)
Durbin-Watson Test
Purpose
Definition
How To
Test for Autocorrelation Among Residuals
F-statistic and t-statistic
F-statistic
Assess Fit of Model Using F-statistic
t-statistic
Assess Significance of Regression Coefficients Using t-statistic
Hat Matrix and Leverage
Hat Matrix
Leverage
Determine High Leverage Observations
Residuals
Purpose
Definition
How To
Assess Model Assumptions Using Residuals
Summary of Output and Diagnostic Statistics
Wilkinson Notation
Overview
Formula Specification
Linear Model Examples
Linear Mixed-Effects Model Examples
Generalized Linear Model Examples
Generalized Linear Mixed-Effects Model Examples
Repeated Measures Model Examples
Stepwise Regression
Stepwise Regression to Select Appropriate Models
Compare large and small stepwise models
Robust Regression — Reduce Outlier Effects
What Is Robust Regression?
Robust Regression versus Standard Least-Squares Fit
Ridge Regression
Introduction to Ridge Regression
Ridge Regression
Lasso and Elastic Net
What Are Lasso and Elastic Net?
Lasso and Elastic Net Details
References
Wide Data via Lasso and Parallel Computing
Lasso Regularization
Lasso and Elastic Net with Cross Validation
Partial Least Squares
Introduction to Partial Least Squares
Partial Least Squares
Linear Mixed-Effects Models
Prepare Data for Linear Mixed-Effects Models
Tables and Dataset Arrays
Design Matrices
Relation of Matrix Form to Tables and Dataset Arrays
Relationship Between Formula and Design Matrices
Formula
Design Matrices for Fixed and Random Effects
Grouping Variables
Estimating Parameters in Linear Mixed-Effects Models
Maximum Likelihood (ML)
Restricted Maximum Likelihood (REML)
Linear Mixed-Effects Model Workflow
Fit Mixed-Effects Spline Regression
Train Linear Regression Model
Generalized Linear Models
Multinomial Models for Nominal Responses
Multinomial Models for Ordinal Responses
Hierarchical Multinomial Models
Generalized Linear Models
What Are Generalized Linear Models?
Prepare Data
Choose Generalized Linear Model and Link Function
Choose Fitting Method and Model
Fit Model to Data
Examine Quality and Adjust the Fitted Model
Predict or Simulate Responses to New Data
Share Fitted Models
Generalized Linear Model Workflow
Lasso Regularization of Generalized Linear Models
What is Generalized Linear Model Lasso Regularization?
Generalized Linear Model Lasso and Elastic Net
References
Regularize Poisson Regression
Regularize Logistic Regression
Regularize Wide Data in Parallel
Generalized Linear Mixed-Effects Models
What Are Generalized Linear Mixed-Effects Models?
GLME Model Equations
Prepare Data for Model Fitting
Choose a Distribution Type for the Model
Choose a Link Function for the Model
Specify the Model Formula
Display the Model
Work with the Model
Fit a Generalized Linear Mixed-Effects Model
Nonlinear Regression
Nonlinear Regression
What Are Parametric Nonlinear Regression Models?
Prepare Data
Represent the Nonlinear Model
Choose Initial Vector beta0
Fit Nonlinear Model to Data
Examine Quality and Adjust the Fitted Nonlinear Model
Predict or Simulate Responses Using a Nonlinear Model
Nonlinear Regression Workflow
Mixed-Effects Models
Introduction to Mixed-Effects Models
Mixed-Effects Model Hierarchy
Specifying Mixed-Effects Models
Specifying Covariate Models
Choosing nlmefit or nlmefitsa
Using Output Functions with Mixed-Effects Models
Examining Residuals for Model Verification
Mixed-Effects Models Using nlmefit and nlmefitsa
Survival Analysis
What Is Survival Analysis?
Introduction
Censoring
Data
Survivor Function
Hazard Function
Kaplan-Meier Method
Hazard and Survivor Functions for Different Groups
Survivor Functions for Two Groups
Cox Proportional Hazards Model
Introduction
Hazard Ratio
Extension of Cox Proportional Hazards Model
Partial Likelihood Function
Partial Likelihood Function for Tied Events
Frequency or Weights of Observations
Cox Proportional Hazards Model for Censored Data
Cox Proportional Hazards Model with Time-Dependent Covariates
Multivariate Methods
Multivariate Linear Regression
Introduction to Multivariate Methods
Multivariate Linear Regression Model
Solving Multivariate Regression Problems
Estimation of Multivariate Regression Models
Least Squares Estimation
Maximum Likelihood Estimation
Missing Response Data
Set Up Multivariate Regression Problems
Response Matrix
Design Matrices
Common Multivariate Regression Problems
Multivariate General Linear Model
Fixed Effects Panel Model with Concurrent Correlation
Longitudinal Analysis
Multidimensional Scaling
Nonclassical and Nonmetric Multidimensional Scaling
Nonclassical Multidimensional Scaling
Nonmetric Multidimensional Scaling
Classical Multidimensional Scaling
Procrustes Analysis
Compare Landmark Data
Data Input
Preprocess Data for Accurate Results
Compare Handwritten Shapes Using Procrustes Analysis
Introduction to Feature Selection
Feature Selection Algorithms
Feature Selection Functions
Sequential Feature Selection
Introduction to Sequential Feature Selection
Select Subset of Features with Comparative Predictive Power
Nonnegative Matrix Factorization
Perform Nonnegative Matrix Factorization
Principal Component Analysis (PCA)
Analyze Quality of Life in U.S. Cities Using PCA
Factor Analysis
Analyze Stock Prices Using Factor Analysis
Robust Feature Selection Using NCA for Regression
Neighborhood Component Analysis (NCA) Feature Selection
NCA Feature Selection for Classification
NCA Feature Selection for Regression
Impact of Standardization
Choosing the Regularization Parameter Value
t-SNE
What Is t-SNE?
t-SNE Algorithm
Barnes-Hut Variation of t-SNE
Characteristics of t-SNE
t-SNE Output Function
t-SNE Output Function Description
tsne optimValues Structure
t-SNE Custom Output Function
Visualize High-Dimensional Data Using t-SNE
tsne Settings
Feature Extraction
What Is Feature Extraction?
Sparse Filtering Algorithm
Reconstruction ICA Algorithm
Feature Extraction Workflow
Extract Mixed Signals
Cluster Analysis
Choose Cluster Analysis Method
Clustering Methods
Comparison of Clustering Methods
Hierarchical Clustering
Introduction to Hierarchical Clustering
Algorithm Description
Similarity Measures
Linkages
Dendrograms
Verify the Cluster Tree
Create Clusters
DBSCAN
Introduction to DBSCAN
Algorithm Description
Determine Values for DBSCAN Parameters
Partition Data Using Spectral Clustering
Introduction to Spectral Clustering
Algorithm Description
Estimate Number of Clusters and Perform Spectral Clustering
k-Means Clustering
Introduction to k-Means Clustering
Compare k-Means Clustering Solutions
Cluster Using Gaussian Mixture Model
How Gaussian Mixture Models Cluster Data
Fit GMM with Different Covariance Options and Initial Conditions
When to Regularize
Model Fit Statistics
Cluster Gaussian Mixture Data Using Hard Clustering
Cluster Gaussian Mixture Data Using Soft Clustering
Tune Gaussian Mixture Models
Cluster Evaluation
Parametric Classification
Parametric Classification
Performance Curves
Introduction to Performance Curves
What are ROC Curves?
Evaluate Classifier Performance Using perfcurve
Nonparametric Supervised Learning
Supervised Learning Workflow and Algorithms
What is Supervised Learning?
Steps in Supervised Learning
Characteristics of Classification Algorithms
Visualize Decision Surfaces of Different Classifiers
Classification Using Nearest Neighbors
Pairwise Distance Metrics
k-Nearest Neighbor Search and Radius Search
Classify Query Data
Find Nearest Neighbors Using a Custom Distance Metric
K-Nearest Neighbor Classification for Supervised Learning
Construct KNN Classifier
Examine Quality of KNN Classifier
Predict Classification Using KNN Classifier
Modify KNN Classifier
Framework for Ensemble Learning
Prepare the Predictor Data
Prepare the Response Data
Choose an Applicable Ensemble Aggregation Method
Set the Number of Ensemble Members
Prepare the Weak Learners
Call fitcensemble or fitrensemble
Ensemble Algorithms
Bootstrap Aggregation (Bagging) and Random Forest
Random Subspace
Boosting Algorithms
Train Classification Ensemble
Train Regression Ensemble
Select Predictors for Random Forests
Test Ensemble Quality
Ensemble Regularization
Regularize a Regression Ensemble
Classification with Imbalanced Data
Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles
Train Ensemble With Unequal Classification Costs
Surrogate Splits
LPBoost and TotalBoost for Small Ensembles
Tune RobustBoost
Random Subspace Classification
Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger
Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger
Detect Outliers Using Quantile Regression
Conditional Quantile Estimation Using Kernel Smoothing
Tune Random Forest Using Quantile Error and Bayesian Optimization
Support Vector Machines for Binary Classification
Understanding Support Vector Machines
Using Support Vector Machines
Train SVM Classifiers Using a Gaussian Kernel
Train SVM Classifier Using Custom Kernel
Optimize an SVM Classifier Fit Using Bayesian Optimization
Plot Posterior Probability Regions for SVM Classification Models
Analyze Images Using Linear Support Vector Machines
Moving Towards Automating Model Selection Using Bayesian Optimization
Automated Classifier Selection with Bayesian Optimization
Bibliography
Decision Trees
Decision Trees
Train Classification Tree
Train Regression Tree
View Decision Tree
Growing Decision Trees
Prediction Using Classification and Regression Trees
Predict Out-of-Sample Responses of Subtrees
Improving Classification Trees and Regression Trees
Examining Resubstitution Error
Cross Validation
Choose Split Predictor Selection Technique
Control Depth or “Leafiness”
Pruning
Splitting Categorical Predictors in Classification Trees
Challenges in Splitting Multilevel Predictors
Algorithms for Categorical Predictor Split
Inspect Data with Multilevel Categorical Predictors
Discriminant Analysis
Discriminant Analysis Classification
Create Discriminant Analysis Classifiers
Creating Discriminant Analysis Model
Weighted Observations
Prediction Using Discriminant Analysis Models
Posterior Probability
Prior Probability
Cost
Create and Visualize Discriminant Analysis Classifier
Improving Discriminant Analysis Models
Deal with Singular Data
Choose a Discriminant Type
Examine the Resubstitution Error and Confusion Matrix
Cross Validation
Change Costs and Priors
Regularize Discriminant Analysis Classifier
Examine the Gaussian Mixture Assumption
Bartlett Test of Equal Covariance Matrices for Linear Discriminant Analysis
Q-Q Plot
Mardia Kurtosis Test of Multivariate Normality
Naive Bayes
Naive Bayes Classification
Supported Distributions
Plot Posterior Classification Probabilities
Classification and Regression for High-Dimensional Data
Classification Learner
Machine Learning in MATLAB
What Is Machine Learning?
Selecting the Right Algorithm
Train Classification Models in Classification Learner App
Train Regression Models in Regression Learner App
Train Neural Networks for Deep Learning
Train Classification Models in Classification Learner App
Automated Classifier Training
Manual Classifier Training
Parallel Classifier Training
Compare and Improve Classification Models
Select Data and Validation for Classification Problem
Select Data from Workspace
Import Data from File
Example Data for Classification
Choose Validation Scheme
Choose Classifier Options
Choose a Classifier Type
Decision Trees
Discriminant Analysis
Logistic Regression
Support Vector Machines
Nearest Neighbor Classifiers
Ensemble Classifiers
Naive Bayes Classifiers
Feature Selection and Feature Transformation Using Classification Learner App
Investigate Features in the Scatter Plot
Select Features to Include
Transform Features with PCA in Classification Learner
Investigate Features in the Parallel Coordinates Plot
Misclassification Costs in Classification Learner App
Specify Misclassification Costs
Assess Model Performance
Misclassification Costs in Exported Model and Generated Code
Hyperparameter Optimization in Classification Learner App
Select Hyperparameters to Optimize
Optimization Options
Minimum Classification Error Plot
Optimization Results
Assess Classifier Performance in Classification Learner
Check Performance in the History List
Plot Classifier Results
Check Performance Per Class in the Confusion Matrix
Check the ROC Curve
Export Plots in Classification Learner App
Export Classification Model to Predict New Data
Export the Model to the Workspace to Make Predictions for New Data
Make Predictions for New Data
Generate MATLAB Code to Train the Model with New Data
Generate C Code for Prediction
Deploy Predictions Using MATLAB Compiler
Train Decision Trees Using Classification Learner App
Train Discriminant Analysis Classifiers Using Classification Learner App
Train Logistic Regression Classifiers Using Classification Learner App
Train Support Vector Machines Using Classification Learner App
Train Nearest Neighbor Classifiers Using Classification Learner App
Train Ensemble Classifiers Using Classification Learner App
Train Naive Bayes Classifiers Using Classification Learner App
Train and Compare Classifiers Using Misclassification Costs in Classification Learner App
Train Classifier Using Hyperparameter Optimization in Classification Learner App
Regression Learner
Train Regression Models in Regression Learner App
Automated Regression Model Training
Manual Regression Model Training
Parallel Regression Model Training
Compare and Improve Regression Models
Select Data and Validation for Regression Problem
Select Data from Workspace
Import Data from File
Example Data for Regression
Choose Validation Scheme
Choose Regression Model Options
Choose Regression Model Type
Linear Regression Models
Regression Trees
Support Vector Machines
Gaussian Process Regression Models
Ensembles of Trees
Feature Selection and Feature Transformation Using Regression Learner App
Investigate Features in the Response Plot
Select Features to Include
Transform Features with PCA in Regression Learner
Hyperparameter Optimization in Regression Learner App
Select Hyperparameters to Optimize
Optimization Options
Minimum MSE Plot
Optimization Results
Assess Model Performance in Regression Learner
Check Performance in History List
View Model Statistics in Current Model Window
Explore Data and Results in Response Plot
Plot Predicted vs. Actual Response
Evaluate Model Using Residuals Plot
Export Plots in Regression Learner App
Export Regression Model to Predict New Data
Export Model to Workspace
Make Predictions for New Data
Generate MATLAB Code to Train Model with New Data
Deploy Predictions Using MATLAB Compiler
Train Regression Trees Using Regression Learner App
Train Regression Model Using Hyperparameter Optimization in Regression Learner App
Support Vector Machines
Understanding Support Vector Machine Regression
Mathematical Formulation of SVM Regression
Solving the SVM Regression Optimization Problem
Markov Models
Markov Chains
Hidden Markov Models (HMM)
Introduction to Hidden Markov Models (HMM)
Analyzing Hidden Markov Models
Design of Experiments
Design of Experiments
Full Factorial Designs
Multilevel Designs
Two-Level Designs
Fractional Factorial Designs
Introduction to Fractional Factorial Designs
Plackett-Burman Designs
General Fractional Designs
Response Surface Designs
Introduction to Response Surface Designs
Central Composite Designs
Box-Behnken Designs
D-Optimal Designs
Introduction to D-Optimal Designs
Generate D-Optimal Designs
Augment D-Optimal Designs
Specify Fixed Covariate Factors
Specify Categorical Factors
Specify Candidate Sets
Improve an Engine Cooling Fan Using Design for Six Sigma Techniques
Statistical Process Control
Control Charts
Capability Studies
Tall Arrays
Logistic Regression with Tall Arrays
Bayesian Optimization with Tall Arrays
Parallel Statistics
Quick Start Parallel Computing for Statistics and Machine Learning Toolbox
What Is Parallel Statistics Functionality?
How To Compute in Parallel
Use Parallel Processing for Regression TreeBagger Workflow
Concepts of Parallel Computing in Statistics and Machine Learning Toolbox
Subtleties in Parallel Computing
Vocabulary for Parallel Computation
When to Run Statistical Functions in Parallel
Why Run in Parallel?
Factors Affecting Speed
Factors Affecting Results
Working with parfor
How Statistical Functions Use parfor
Characteristics of parfor
Reproducibility in Parallel Statistical Computations
Issues and Considerations in Reproducing Parallel Computations
Running Reproducible Parallel Computations
Parallel Statistical Computation Using Random Numbers
Implement Jackknife Using Parallel Computing
Implement Cross-Validation Using Parallel Computing
Simple Parallel Cross Validation
Reproducible Parallel Cross Validation
Implement Bootstrap Using Parallel Computing
Bootstrap in Serial and Parallel
Reproducible Parallel Bootstrap
Code Generation
Introduction to Code Generation
Code Generation Workflows
Code Generation Applications
General Code Generation Workflow
Define Entry-Point Function
Generate Code
Verify Generated Code
Code Generation for Prediction of Machine Learning Model at Command Line
Code Generation for Nearest Neighbor Searcher
Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App
Code Generation and Classification Learner App
Load Sample Data
Enable PCA
Train Models
Export Model to Workspace
Generate C Code for Prediction
Predict Class Labels Using MATLAB Function Block
Specify Variable-Size Arguments for Code Generation
Train SVM Classifier with Categorical Predictors and Generate C/C++ Code
System Objects for Classification and Code Generation
Predict Class Labels Using Stateflow
Human Activity Recognition Simulink Model for Smartphone Deployment
Code Generation for Prediction and Update Using Coder Configurer
Code Generation for Probability Distribution Objects
Fixed-Point Code Generation for Prediction of SVM
Generate Code to Classify Numeric Data in Table
Functions
addedvarplot
clustering.evaluation.ClusterCriterion.addK
addlevels
qrandstream.addlistener
GeneralizedLinearMixedModel.anova
addTerms
addTerms
adtest
andrewsplot
anova
LinearMixedModel.anova
anova1
anova2
anovan
RepeatedMeasuresModel.anova
ansaribradley
aoctool
TreeBagger.append
barttest
BayesianOptimization
bayesopt
bbdesign
bestPoint
betacdf
betafit
betainv
betalike
betapdf
betarnd
betastat
binocdf
binofit
binoinv
binopdf
binornd
binostat
binScatterPlot
biplot
bootci
bootstrp
boxplot
boundary
CalinskiHarabaszEvaluation
candexch
candgen
canoncorr
capability
capaplot
caseread
casewrite
DaviesBouldinEvaluation
dataset.cat
cdf
ccdesign
cdf
cdfplot
cell2dataset
dataset.cellstr
chi2cdf
chi2gof
chi2inv
chi2pdf
chi2rnd
chi2stat
cholcov
ClassificationBaggedEnsemble
ClassificationECOC
ClassificationECOCCoderConfigurer
ClassificationDiscriminant
ClassificationEnsemble
ClassificationKNN
ClassificationLinear
ClassificationLinearCoderConfigurer
ClassificationNaiveBayes
ClassificationPartitionedECOC
ClassificationPartitionedEnsemble
ClassificationPartitionedKernel
ClassificationPartitionedKernelECOC
ClassificationPartitionedLinear
ClassificationPartitionedLinearECOC
ClassificationPartitionedModel
ClassificationSVM
ClassificationSVMCoderConfigurer
ClassificationTree
ClassificationTreeCoderConfigurer
classify
cluster
cluster
ClusterCriterion
clusterdata
cmdscale
coefCI
GeneralizedLinearMixedModel.coefCI
coefCI
LinearMixedModel.coefCI
NonLinearModel.coefCI
coefTest
GeneralizedLinearMixedModel.coefTest
coefTest
LinearMixedModel.coefTest
NonLinearModel.coefTest
RepeatedMeasuresModel.coeftest
CompactTreeBagger.combine
combnk
ClassificationDiscriminant.compact
compact
ClassificationEnsemble.compact
ClassificationNaiveBayes.compact
compact
ClassificationTree.compact
compact
compact
RegressionEnsemble.compact
RegressionGP.compact
RegressionSVM.compact
RegressionTree.compact
TreeBagger.compact
CompactClassificationDiscriminant
CompactClassificationECOC
CompactClassificationEnsemble
CompactClassificationNaiveBayes
CompactClassificationSVM
CompactClassificationTree
CompactLinearModel
CompactGeneralizedLinearModel
CompactRegressionEnsemble
CompactRegressionGP
CompactRegressionSVM
CompactRegressionTree
CompactTreeBagger
CompactTreeBagger
GeneralizedLinearMixedModel.compare
LinearMixedModel.compare
compareHoldout
confusionchart
ConfusionMatrixChart
confusionmat
controlchart
controlrules
cophenet
copulacdf
copulafit
copulaparam
copulapdf
copulastat
copularnd
cordexch
corr
corrcov
GeneralizedLinearMixedModel.covarianceParameters
LinearMixedModel.covarianceParameters
coxphfit
createns
crosstab
crossval
ClassificationDiscriminant.crossval
crossval
ClassificationEnsemble.crossval
crossval
ClassificationNaiveBayes.crossval
crossval
ClassificationTree.crossval
RegressionEnsemble.crossval
RegressionGP.crossval
RegressionSVM.crossval
RegressionTree.crossval
ClassificationTree.cvloss
RegressionTree.cvloss
cvpartition
cvpartition
ClassificationDiscriminant.cvshrink
RegressionEnsemble.cvshrink
datasample
dataset
dataset
dataset.dataset2cell
dataset.dataset2struct
dataset2table
dataset.datasetfun
daugment
dbscan
dcovary
qrandstream.delete
dendrogram
dataset.Description
designecoc
devianceTest
GeneralizedLinearMixedModel.designMatrix
LinearMixedModel.designMatrix
dfittool
dataset.DimNames
discardSupportVectors
discardSupportVectors
CompactRegressionSVM.discardSupportVectors
cvpartition.disp
dataset.disp
GeneralizedLinearMixedModel.disp
LinearMixedModel.disp
NonLinearModel.disp
qrandstream.disp
cvpartition.display
dataset.display
distributionFitter
Probability Distribution Function
dataset.double
droplevels
dummyvar
dwtest
dwtest
ecdf
ecdfhist
edge
ClassificationLinear.edge
CompactClassificationDiscriminant.edge
edge
CompactClassificationEnsemble.edge
CompactClassificationNaiveBayes.edge
edge
CompactClassificationTree.edge
dataset.end
RepeatedMeasuresModel.epsilon
evcdf
evfit
evinv
qrandstream.eq
CompactTreeBagger.error
TreeBagger.error
evalclusters
evlike
evpdf
evrnd
evstat
expcdf
expfit
ExhaustiveSearcher
expinv
explike
dataset.export
exppdf
exprnd
expstat
factoran
fcdf
feval
feval
NonLinearModel.feval
ff2n
TreeBagger.fillprox
qrandstream.findobj
qrandstream.findprop
finv
fishertest
ClassificationDiscriminant.fit
ClassificationKNN.fit
ClassificationTree.fit
GeneralizedLinearModel.fit
gmdistribution.fit
LinearModel.fit
LinearMixedModel.fit
NonLinearModel.fit
RegressionTree.fit
fitcauto
fitcdiscr
fitcecoc
fitcensemble
fitcknn
fitclinear
fitcnb
fitcsvm
fitctree
fitglm
fitglme
fitgmdist
fitlm
fitlme
fitlmematrix
fitrgp
fitrlinear
fitrm
fitdist
fitensemble
fitnlm
LinearMixedModel.fitmatrix
fitPosterior
fitPosterior
fitrensemble
fitrsvm
fitrtree
fitSVMPosterior
GeneralizedLinearMixedModel.fitted
LinearMixedModel.fitted
GeneralizedLinearMixedModel.fixedEffects
LinearMixedModel.fixedEffects
fpdf
fracfact
fracfactgen
friedman
frnd
fscchi2
fscmrmr
fscnca
fsrnca
fsrftest
fstat
fsulaplacian
fsurfht
fullfact
gagerr
gamcdf
gamfit
gaminv
gamlike
gampdf
gamrnd
gamstat
qrandstream.ge
GeneralizedLinearMixedModel
GeneralizedLinearModel
generateCode
generateFiles
generateLearnerDataTypeFcn
geocdf
geoinv
geomean
geopdf
geornd
geostat
GapEvaluation
dataset.get
getlabels
getlevels
gevcdf
gevfit
gevinv
gevlike
gevpdf
gevrnd
gevstat
gline
glmfit
glmval
glyphplot
gmdistribution
gname
gpcdf
gpfit
gpinv
gplike
gppdf
gplotmatrix
gprnd
gpstat
TreeBagger.growTrees
grp2idx
grpstats
RepeatedMeasuresModel.grpstats
gscatter
qrandstream.gt
haltonset
harmmean
hist3
histfit
hmmdecode
hmmestimate
hmmgenerate
hmmtrain
hmmviterbi
dataset.horzcat
hougen
hygecdf
hygeinv
hygepdf
hygernd
hygestat
hyperparameters
icdf
inconsistent
clustering.evaluation.GapEvaluation.increaseB
interactionplot
dataset.intersect
invpred
iqr
dataset.isempty
islevel
dataset.ismember
dataset.ismissing
qrandstream.isvalid
iwishrnd
jackknife
jbtest
johnsrnd
dataset.join
KDTreeSearcher
kfoldEdge
ClassificationPartitionedEnsemble.kfoldEdge
kfoldEdge
kfoldEdge
ClassificationPartitionedLinear.kfoldEdge
ClassificationPartitionedLinearECOC.kfoldEdge
ClassificationPartitionedModel.kfoldEdge
kfoldfun
ClassificationPartitionedModel.kfoldfun
RegressionPartitionedModel.kfoldfun
kfoldLoss
ClassificationPartitionedEnsemble.kfoldLoss
kfoldLoss
kfoldLoss
ClassificationPartitionedLinear.kfoldLoss
ClassificationPartitionedLinearECOC.kfoldLoss
ClassificationPartitionedModel.kfoldLoss
RegressionPartitionedEnsemble.kfoldLoss
RegressionPartitionedLinear.kfoldLoss
RegressionPartitionedModel.kfoldLoss
kfoldMargin
kfoldMargin
kfoldMargin
ClassificationPartitionedLinear.kfoldMargin
ClassificationPartitionedLinearECOC.kfoldMargin
ClassificationPartitionedModel.kfoldMargin
kfoldPredict
kfoldPredict
kfoldPredict
ClassificationPartitionedLinear.kfoldPredict
ClassificationPartitionedLinearECOC.kfoldPredict
ClassificationPartitionedModel.kfoldPredict
RegressionPartitionedLinear.kfoldPredict
RegressionPartitionedModel.kfoldPredict
kmeans
kmedoids
knnsearch
knnsearch
kruskalwallis
ksdensity
kstest
kstest2
kurtosis
lasso
lassoglm
lassoPlot
qrandstream.le
learnerCoderConfigurer
dataset.length
levelcounts
leverage
lhsdesign
lhsnorm
lillietest
LinearModel
LinearMixedModel
linhyptest
linkage
loadCompactModel
loadLearnerForCoder
logncdf
lognfit
logninv
lognlike
lognpdf
lognrnd
lognstat
CompactClassificationDiscriminant.logP
CompactClassificationNaiveBayes.logP
loss
ClassificationLinear.loss
CompactClassificationDiscriminant.loss
loss
CompactClassificationEnsemble.loss
CompactClassificationNaiveBayes.loss
loss
CompactClassificationTree.loss
CompactRegressionEnsemble.loss
CompactRegressionGP.loss
CompactRegressionSVM.loss
CompactRegressionTree.loss
FeatureSelectionNCAClassification.loss
FeatureSelectionNCARegression.loss
RegressionLinear.loss
lowerparams
qrandstream.lt
lsline
mad
mahal
CompactClassificationDiscriminant.mahal
mahal
maineffectsplot
ClassificationDiscriminant.make
makecdiscr
makedist
RepeatedMeasuresModel.manova
manova1
manovacluster
margin
ClassificationLinear.margin
CompactClassificationDiscriminant.margin
margin
CompactClassificationEnsemble.margin
CompactClassificationNaiveBayes.margin
margin
CompactClassificationTree.margin
CompactTreeBagger.margin
TreeBagger.margin
RepeatedMeasuresModel.margmean
RepeatedMeasuresModel.mauchly
mat2dataset
mdscale
CompactTreeBagger.mdsprox
TreeBagger.mdsprox
mean
CompactTreeBagger.meanMargin
TreeBagger.meanMargin
CompactClassificationTree.surrogateAssociation
CompactRegressionTree.surrogateAssociation
median
mergelevels
mhsample
mle
mlecov
mnpdf
mnrfit
mnrnd
mnrval
moment
multcompare
RepeatedMeasuresModel.multcompare
multivarichart
mvksdensity
mvncdf
mvnpdf
mvregress
mvregresslike
mvnrnd
mvtcdf
mvtpdf
mvtrnd
cvpartition.NumObservations
nancov
nanmax
nanmean
nanmedian
nanmin
nanstd
nansum
nanvar
nearcorr
nbincdf
nbinfit
nbininv
nbinpdf
nbinrnd
nbinstat
FeatureSelectionNCAClassification
FeatureSelectionNCARegression
ncfcdf
ncfinv
ncfpdf
ncfrnd
ncfstat
nctcdf
nctinv
nctpdf
nctrnd
nctstat
ncx2cdf
ncx2inv
ncx2pdf
ncx2rnd
ncx2stat
dataset.ndims
qrandstream.ne
negloglik
net
CompactClassificationDiscriminant.nLinearCoeffs
nlinfit
nlintool
nlmefit
nlmefitsa
nlparci
nlpredci
nnmf
nominal
qrandstream.notify
NonLinearModel
normcdf
normfit
norminv
normlike
normpdf
normplot
normrnd
normspec
normstat
nsegments
dataset.numel
cvpartition.NumTestSets
dataset.ObsNames
optimalleaforder
ClassificationBaggedEnsemble.oobEdge
TreeBagger.oobError
ClassificationBaggedEnsemble.oobLoss
RegressionBaggedEnsemble.oobLoss
ClassificationBaggedEnsemble.oobMargin
TreeBagger.oobMargin
TreeBagger.oobMeanMargin
ClassificationBaggedEnsemble.oobPermutedPredictorImportance
RegressionBaggedEnsemble.oobPermutedPredictorImportance
ClassificationBaggedEnsemble.oobPredict
RegressionBaggedEnsemble.oobPredict
TreeBagger.oobPredict
TreeBagger.oobQuantileError
TreeBagger.oobQuantilePredict
optimizableVariable
ordinal
CompactTreeBagger.outlierMeasure
parallelcoords
paramci
paretotails
partialcorr
partialcorri
pca
pcacov
pcares
ppca
pdf
pdf
pdist
pdist2
pearsrnd
perfcurve
clustering.evaluation.ClusterCriterion.plot
plot
RepeatedMeasuresModel.plot
plotAdded
plotAdjustedResponse
plot
plotDiagnostics
plotDiagnostics
NonLinearModel.plotDiagnostics
plotEffects
plotInteraction
RepeatedMeasuresModel.plotprofile
plotResiduals
GeneralizedLinearMixedModel.plotResiduals
plotResiduals
LinearMixedModel.plotResiduals
NonLinearModel.plotResiduals
plotSlice
plotSlice
NonLinearModel.plotSlice
plsregress
qrandstream.PointSet
poisscdf
poissfit
poissinv
poisspdf
poissrnd
poisstat
polyconf
polytool
posterior
RegressionGP.postFitStatistics
prctile
predict
ClassificationLinear.predict
CompactClassificationDiscriminant.predict
predict
CompactClassificationEnsemble.predict
CompactClassificationNaiveBayes.predict
predict
CompactClassificationTree.predict
CompactRegressionEnsemble.predict
CompactRegressionGP.predict
CompactRegressionSVM.predict
CompactRegressionTree.predict
RegressionLinear.predict
CompactTreeBagger.predict
predict
GeneralizedLinearMixedModel.predict
predict
LinearMixedModel.predict
FeatureSelectionNCAClassification.predict
FeatureSelectionNCARegression.predict
NonLinearModel.predict
RepeatedMeasuresModel.predict
TreeBagger.predict
predictConstraints
predictError
predictObjective
predictObjectiveEvaluationTime
CompactClassificationEnsemble.predictorImportance
CompactClassificationTree.predictorImportance
CompactRegressionEnsemble.predictorImportance
CompactRegressionTree.predictorImportance
probplot
procrustes
proflik
CompactTreeBagger.proximity
ClassificationTree.prune
RegressionTree.prune
qrandstream.qrand
qrandstream
qrandstream
qqplot
quantile
qrandstream.rand
TreeBagger.quantileError
TreeBagger.quantilePredict
randg
random
random
GeneralizedLinearMixedModel.random
random
random
LinearMixedModel.random
NonLinearModel.random
RepeatedMeasuresModel.random
GeneralizedLinearMixedModel.randomEffects
LinearMixedModel.randomEffects
randsample
randtool
range
rangesearch
rangesearch
ranksum
RepeatedMeasuresModel.ranova
raylcdf
raylfit
raylinv
raylpdf
raylrnd
raylstat
rcoplot
ReconstructionICA
refcurve
GeneralizedLinearMixedModel.refit
FeatureSelectionNCAClassification.refit
FeatureSelectionNCARegression.refit
reduceDimensions
refline
regress
RegressionBaggedEnsemble
RegressionEnsemble
RegressionGP
RegressionLinear
RegressionLinearCoderConfigurer
RegressionPartitionedEnsemble
RegressionPartitionedLinear
RegressionPartitionedModel
RegressionPartitionedSVM
RegressionSVM
RegressionSVMCoderConfigurer
RegressionTree
RegressionTreeCoderConfigurer
regstats
RegressionEnsemble.regularize
relieff
CompactClassificationEnsemble.removeLearners
CompactRegressionEnsemble.removeLearners
removeTerms
removeTerms
reorderlevels
cvpartition.repartition
RepeatedMeasuresModel
dataset.replacedata
dataset.replaceWithMissing
qrandstream.reset
GeneralizedLinearMixedModel.residuals
LinearMixedModel.residuals
GeneralizedLinearMixedModel.response
LinearMixedModel.response
ClassificationDiscriminant.resubEdge
resubEdge
ClassificationEnsemble.resubEdge
resubEdge
ClassificationNaiveBayes.resubEdge
resubEdge
ClassificationTree.resubEdge
ClassificationDiscriminant.resubLoss
resubLoss
ClassificationEnsemble.resubLoss
resubLoss
ClassificationNaiveBayes.resubLoss
resubLoss
ClassificationTree.resubLoss
RegressionEnsemble.resubLoss
RegressionGP.resubLoss
RegressionSVM.resubLoss
RegressionTree.resubLoss
ClassificationDiscriminant.resubMargin
resubMargin
ClassificationEnsemble.resubMargin
resubMargin
ClassificationNaiveBayes.resubMargin
resubMargin
ClassificationTree.resubMargin
ClassificationDiscriminant.resubPredict
resubPredict
ClassificationEnsemble.resubPredict
resubPredict
ClassificationNaiveBayes.resubPredict
resubPredict
ClassificationTree.resubPredict
RegressionEnsemble.resubPredict
RegressionGP.resubPredict
RegressionSVM.resubPredict
RegressionTree.resubPredict
resume
ClassificationEnsemble.resume
ClassificationPartitionedEnsemble.resume
resume
RegressionEnsemble.resume
RegressionPartitionedEnsemble.resume
RegressionSVM.resume
rica
ridge
robustcov
robustdemo
robustfit
rotatefactors
rowexch
rsmdemo
rstool
runstest
sampsizepwr
saveCompactModel
saveLearnerForCoder
scatterhist
scramble
segment
ClassificationLinear.selectModels
selectModels
RegressionLinear.selectModels
sequentialfs
dataset.set
CompactTreeBagger.setDefaultYfit
dataset.setdiff
setlabels
dataset.setxor
RegressionEnsemble.shrink
signrank
signtest
silhouette
SilhouetteEvaluation
dataset.single
dataset.size
slicesample
skewness
sobolset
sortClasses
dataset.sortrows
sparsefilt
SparseFiltering
spectralcluster
squareform
dataset.stack
qrandstream.State
statget
statset
std
step
step
stepwise
GeneralizedLinearModel.stepwise
LinearModel.stepwise
stepwiseglm
stepwiselm
stepwisefit
dataset.subsasgn
dataset.subsref
dataset.summary
struct2dataset
surfht
table2dataset
tabulate
tblread
tblwrite
tcdf
tdfread
ClassificationDiscriminant.template
ClassificationKNN.template
ClassificationTree.template
RegressionTree.template
templateDiscriminant
templateECOC
templateEnsemble
templateKernel
templateKNN
templateLinear
templateNaiveBayes
templateSVM
templateTree
cvpartition.test
testcholdout
testckfold
cvpartition.TestSize
tiedrank
tinv
tpdf
cvpartition.training
cvpartition.TrainSize
transform
TreeBagger
TreeBagger
trimmean
trnd
truncate
tsne
tstat
ttest
ttest2
cvpartition.Type
BetaDistribution
BinomialDistribution
BirnbaumSaundersDistribution
BurrDistribution
ExponentialDistribution
ExtremeValueDistribution
GammaDistribution
GeneralizedExtremeValueDistribution
GeneralizedParetoDistribution
HalfNormalDistribution
InverseGaussianDistribution
KernelDistribution
LogisticDistribution
LoglogisticDistribution
LognormalDistribution
MultinomialDistribution
NakagamiDistribution
NegativeBinomialDistribution
NormalDistribution
PiecewiseLinearDistribution
PoissonDistribution
RayleighDistribution
RicianDistribution
StableDistribution
tLocationScaleDistribution
TriangularDistribution
UniformDistribution
WeibullDistribution
dataset.union
dataset.unique
dataset.Units
unidcdf
unidinv
unidpdf
unidrnd
unidstat
unifcdf
unifinv
unifit
unifpdf
unifrnd
unifstat
dataset.unstack
update
upperparams
dataset.UserData
validatedUpdateInputs
var
dataset.VarDescription
dataset.VarNames
vartest
vartest2
vartestn
dataset.vertcat
clustering.evaluation.ClusterCriterion.compact
CompactClassificationTree.view
CompactRegressionTree.view
wblcdf
wblfit
wblinv
wbllike
wblpdf
wblplot
wblrnd
wblstat
wishrnd
xptread
x2fx
zscore
ztest
hmcSampler
HamiltonianSampler
HamiltonianSampler.estimateMAP
HamiltonianSampler.tuneSampler
HamiltonianSampler.drawSamples
HamiltonianSampler.diagnostics
Classification Learner
Regression Learner
Distribution Fitter
plotPartialDependence
fitckernel
ClassificationKernel
edge
loss
margin
predict
resume
fitrkernel
RegressionKernel
loss
predict
resume
RegressionPartitionedKernel
kfoldLoss
kfoldPredict
Sample Data Sets
Sample Data Sets
Probability Distributions
Bernoulli Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Examples
Related Distributions
Beta Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Example
Binomial Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Example
Related Distributions
Birnbaum-Saunders Distribution
Definition
Background
Parameters
Burr Type XII Distribution
Definition
Background
Parameters
Fit a Burr Distribution and Draw the cdf
Compare Lognormal and Burr Distribution pdfs
Burr pdf for Various Parameters
Survival and Hazard Functions of Burr Distribution
Divergence of Parameter Estimates
Chi-Square Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Descriptive Statistics
Examples
Related Distributions
Exponential Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Hazard Function
Examples
Related Distributions
Extreme Value Distribution
Definition
Background
Parameters
Examples
F Distribution
Definition
Background
Examples
Gamma Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Descriptive Statistics
Examples
Related Distributions
Generalized Extreme Value Distribution
Definition
Background
Parameters
Examples
Generalized Pareto Distribution
Definition
Background
Parameters
Examples
Geometric Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Hazard Function
Examples
Related Distributions
Half-Normal Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Relationship to Other Distributions
Hypergeometric Distribution
Definition
Background
Examples
Inverse Gaussian Distribution
Definition
Background
Parameters
Inverse Wishart Distribution
Definition
Background
Example
Kernel Distribution
Overview
Kernel Density Estimator
Kernel Smoothing Function
Bandwidth
Logistic Distribution
Overview
Parameters
Probability Density Function
Relationship to Other Distributions
Loglogistic Distribution
Overview
Parameters
Probability Density Function
Relationship to Other Distributions
Lognormal Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Related Distributions
Multinomial Distribution
Overview
Parameter
Probability Density Function
Descriptive Statistics
Relationship to Other Distributions
Multivariate Normal Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Multivariate t Distribution
Definition
Background
Example
Nakagami Distribution
Definition
Background
Parameters
Negative Binomial Distribution
Definition
Background
Parameters
Example
Noncentral Chi-Square Distribution
Definition
Background
Examples
Noncentral F Distribution
Definition
Background
Examples
Noncentral t Distribution
Definition
Background
Examples
Normal Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Related Distributions
Piecewise Linear Distribution
Overview
Parameters
Cumulative Distribution Function
Relationship to Other Distributions
Poisson Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Examples
Related Distributions
Rayleigh Distribution
Definition
Background
Parameters
Examples
Rician Distribution
Definition
Background
Parameters
Stable Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Relationship to Other Distributions
Student's t Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Descriptive Statistics
Examples
Related Distributions
t Location-Scale Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Relationship to Other Distributions
Triangular Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Uniform Distribution (Continuous)
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Descriptive Statistics
Random Number Generation
Examples
Related Distributions
Uniform Distribution (Discrete)
Definition
Background
Examples
Weibull Distribution
Overview
Parameters
Probability Density Function
Cumulative Distribution Function
Inverse Cumulative Distribution Function
Hazard Function
Examples
Related Distributions
Wishart Distribution
Overview
Parameters
Probability Density Function
Example
Bibliography
Bibliography

Citation preview

Statistics and Machine Learning Toolbox™ User's Guide

R2020a

How to Contact MathWorks Latest news:

www.mathworks.com

Sales and services:

www.mathworks.com/sales_and_services

User community:

www.mathworks.com/matlabcentral

Technical support:

www.mathworks.com/support/contact_us

Phone:

508-647-7000

The MathWorks, Inc. 1 Apple Hill Drive Natick, MA 01760-2098 Statistics and Machine Learning Toolbox™ User's Guide © COPYRIGHT 1993–2020 by The MathWorks, Inc. The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc. FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through the federal government of the United States. By accepting delivery of the Program or Documentation, the government hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and Documentation by the federal government (or other entity acquiring for or through the federal government) and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the government's needs or is inconsistent in any respect with federal procurement law, the government agrees to return the Program and Documentation, unused, to The MathWorks, Inc.

Trademarks

MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders. Patents

MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for more information.

Revision History

September 1993 March 1996 January 1997 November 2000 May 2001 July 2002 February 2003 June 2004 October 2004 March 2005 September 2005 March 2006 September 2006 March 2007 September 2007 March 2008 October 2008 March 2009 September 2009 March 2010 September 2010 April 2011 September 2011 March 2012 September 2012 March 2013 September 2013 March 2014 October 2014 March 2015 September 2015 March 2016 September 2016 March 2017 September 2017 March 2018 September 2018 March 2019 September 2019 March 2020

First printing Second printing Third printing Fourth printing Fifth printing Sixth printing Online only Seventh printing Online only Online only Online only Online only Online only Eighth printing Ninth printing Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only

Version 1.0 Version 2.0 Version 2.11 Revised for Version 3.0 (Release 12) Minor revisions Revised for Version 4.0 (Release 13) Revised for Version 4.1 (Release 13.0.1) Revised for Version 5.0 (Release 14) Revised for Version 5.0.1 (Release 14SP1) Revised for Version 5.0.2 (Release 14SP2) Revised for Version 5.1 (Release 14SP3) Revised for Version 5.2 (Release 2006a) Revised for Version 5.3 (Release 2006b) Revised for Version 6.0 (Release 2007a) Revised for Version 6.1 (Release 2007b) Revised for Version 6.2 (Release 2008a) Revised for Version 7.0 (Release 2008b) Revised for Version 7.1 (Release 2009a) Revised for Version 7.2 (Release 2009b) Revised for Version 7.3 (Release 2010a) Revised for Version 7.4 (Release 2010b) Revised for Version 7.5 (Release 2011a) Revised for Version 7.6 (Release 2011b) Revised for Version 8.0 (Release 2012a) Revised for Version 8.1 (Release 2012b) Revised for Version 8.2 (Release 2013a) Revised for Version 8.3 (Release 2013b) Revised for Version 9.0 (Release 2014a) Revised for Version 9.1 (Release 2014b) Revised for Version 10.0 (Release 2015a) Revised for Version 10.1 (Release 2015b) Revised for Version 10.2 (Release 2016a) Revised for Version 11 (Release 2016b) Revised for Version 11.1 (Release 2017a) Revised for Version 11.2 (Release 2017b) Revised for Version 11.3 (Release 2018a) Revised for Version 11.4 (Release 2018b) Revised for Version 11.5 (Release 2019a) Revised for Version 11.6 (Release 2019b) Revised for Version 11.7 (Release 2020a)

Contents

1

2

Getting Started Statistics and Machine Learning Toolbox Product Description . . . . . . . . . Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1-2 1-2

Supported Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1-3

Organizing Data Other MATLAB Functions Supporting Nominal and Ordinal Arrays . . . . .

2-2

Create Nominal and Ordinal Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Nominal Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Ordinal Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-3 2-3 2-4

Change Category Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Change Category Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-7 2-7

Reorder Category Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reorder Category Levels in Ordinal Arrays . . . . . . . . . . . . . . . . . . . . . . . . Reorder Category Levels in Nominal Arrays . . . . . . . . . . . . . . . . . . . . . .

2-9 2-9 2-10

Categorize Numeric Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Categorize Numeric Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-13 2-13

Merge Category Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Merge Category Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-16 2-16

Add and Drop Category Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-18

Plot Data Grouped by Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot Data Grouped by Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-21 2-21

Test Differences Between Category Means . . . . . . . . . . . . . . . . . . . . . . . .

2-25

Summary Statistics Grouped by Category . . . . . . . . . . . . . . . . . . . . . . . . . Summary Statistics Grouped by Category . . . . . . . . . . . . . . . . . . . . . . . .

2-32 2-32

Sort Ordinal Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sort Ordinal Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-34 2-34

v

vi

Contents

Nominal and Ordinal Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Nominal and Ordinal Arrays? . . . . . . . . . . . . . . . . . . . . . . . . . Nominal and Ordinal Array Conversion . . . . . . . . . . . . . . . . . . . . . . . . . .

2-36 2-36 2-36

Advantages of Using Nominal and Ordinal Arrays . . . . . . . . . . . . . . . . . . Manipulate Category Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis Using Nominal and Ordinal Arrays . . . . . . . . . . . . . . . . . . . . . . Reduce Memory Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-38 2-38 2-38 2-39

Index and Search Using Nominal and Ordinal Arrays . . . . . . . . . . . . . . . Index By Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Indexing and Searching Methods . . . . . . . . . . . . . . . . . . . . . . .

2-41 2-41 2-41

Grouping Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Grouping Variables? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Group Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis Using Grouping Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missing Group Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-45 2-45 2-45 2-46 2-46

Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Dummy Variables? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-48 2-48 2-49

Linear Regression with Categorical Covariates . . . . . . . . . . . . . . . . . . . .

2-52

Create a Dataset Array from Workspace Variables . . . . . . . . . . . . . . . . . . Create a Dataset Array from a Numeric Array . . . . . . . . . . . . . . . . . . . . . Create Dataset Array from Heterogeneous Workspace Variables . . . . . . .

2-57 2-57 2-59

Create a Dataset Array from a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a Dataset Array from a Tab-Delimited Text File . . . . . . . . . . . . . . Create a Dataset Array from a Comma-Separated Text File . . . . . . . . . . . Create a Dataset Array from an Excel File . . . . . . . . . . . . . . . . . . . . . . .

2-62 2-62 2-64 2-66

Add and Delete Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-68

Add and Delete Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-71

Access Data in Dataset Array Variables . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-74

Select Subsets of Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-79

Sort Observations in Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-82

Merge Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-85

Stack or Unstack Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-88

Calculations on Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-92

Export Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-95

Clean Messy and Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-97

3

4

Dataset Arrays in the Variables Editor . . . . . . . . . . . . . . . . . . . . . . . . . . Open Dataset Arrays in the Variables Editor . . . . . . . . . . . . . . . . . . . . . Modify Variable and Observation Names . . . . . . . . . . . . . . . . . . . . . . . . Reorder or Delete Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add New Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sort Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Select a Subset of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-101 2-101 2-102 2-103 2-105 2-106 2-107 2-109

Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Dataset Arrays? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dataset Array Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dataset Array Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-112 2-112 2-112 2-113

Index and Search Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ways To Index and Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-114 2-114 2-114

Descriptive Statistics Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3-2 3-2

Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3-4 3-4

Quantiles and Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3-6

Exploratory Analysis of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3-10

Resampling Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bootstrap Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jackknife Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Computing Support for Resampling Methods . . . . . . . . . . . . . . .

3-14 3-14 3-16 3-17

Data with Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Working with Data with Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . .

3-18 3-18

Statistical Visualization Create Scatter Plots Using Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . .

4-2

Box Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare Grouped Data Using Box Plots . . . . . . . . . . . . . . . . . . . . . . . . . .

4-4 4-4

Distribution Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normal Probability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4-6 4-6

vii

Probability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 Quantile-Quantile Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9 Cumulative Distribution Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11

5

Probability Distributions Working with Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Distribution Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Distribution Apps and User Interfaces . . . . . . . . . . . . . . . . . . .

viii

Contents

5-2 5-2 5-5 5-9

Supported Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous Distributions (Data) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous Distributions (Statistics) . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonparametric Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flexible Distribution Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-13 5-14 5-17 5-18 5-19 5-20 5-20

Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-21

Negative Loglikelihood Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Find MLEs Using Negative Loglikelihood Function . . . . . . . . . . . . . . . . .

5-23 5-23

Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-26

Nonparametric and Empirical Probability Distributions . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Empirical Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . Piecewise Linear Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pareto Tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Triangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-29 5-29 5-29 5-30 5-31 5-32 5-33

Fit Kernel Distribution Object to Data . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-35

Fit Kernel Distribution Using ksdensity . . . . . . . . . . . . . . . . . . . . . . . . . .

5-38

Fit Distributions to Grouped Data Using ksdensity . . . . . . . . . . . . . . . . .

5-40

Fit a Nonparametric Distribution with Pareto Tails . . . . . . . . . . . . . . . . .

5-42

Generate Random Numbers Using the Triangular Distribution . . . . . . .

5-46

Model Data Using the Distribution Fitter App . . . . . . . . . . . . . . . . . . . . . Explore Probability Distributions Interactively . . . . . . . . . . . . . . . . . . . . Create and Manage Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a New Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manage Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluate Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-50 5-50 5-51 5-54 5-57 5-58 5-60

Exclude Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Save and Load Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate a File to Fit and Plot Distributions . . . . . . . . . . . . . . . . . . . . . .

5-62 5-66 5-67

Fit a Distribution Using the Distribution Fitter App . . . . . . . . . . . . . . . . Step 1: Load Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 2: Import Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 3: Create a New Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 4: Create and Manage Additional Fits . . . . . . . . . . . . . . . . . . . . . . .

5-69 5-69 5-69 5-71 5-74

Define Custom Distributions Using the Distribution Fitter App . . . . . . . Open the Distribution Fitter App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Define Custom Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import Custom Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-78 5-78 5-79 5-80

Explore the Random Number Generation UI . . . . . . . . . . . . . . . . . . . . . .

5-82

Compare Multiple Distribution Fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-84

Fit Probability Distribution Objects to Grouped Data . . . . . . . . . . . . . . .

5-89

Multinomial Probability Distribution Objects . . . . . . . . . . . . . . . . . . . . . .

5-92

Multinomial Probability Distribution Functions . . . . . . . . . . . . . . . . . . . .

5-95

Generate Random Numbers Using Uniform Distribution Inversion . . . .

5-98

Represent Cauchy Distribution Using t Location-Scale . . . . . . . . . . . . .

5-101

Generate Cauchy Random Numbers Using Student's t . . . . . . . . . . . . .

5-104

Generate Correlated Data Using Rank Correlation . . . . . . . . . . . . . . . .

5-105

Create Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-109

Fit Gaussian Mixture Model to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-112

Simulate Data from Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . .

5-116

Copulas: Generate Correlated Samples . . . . . . . . . . . . . . . . . . . . . . . . . . Determining Dependence Between Simulation Inputs . . . . . . . . . . . . . . Constructing Dependent Bivariate Distributions . . . . . . . . . . . . . . . . . . Using Rank Correlation Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Bivariate Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Higher Dimension Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Archimedean Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulating Dependent Multivariate Data Using Copulas . . . . . . . . . . . . Fitting Copulas to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5-118 5-118 5-121 5-125 5-127 5-133 5-134 5-135 5-139

ix

6

Gaussian Processes Gaussian Process Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6-2

Kernel (Covariance) Function Options . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6-5

Exact GPR Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 Computational Complexity of Exact Parameter Estimation and Prediction ..................................................... 6-12

7

Subset of Data Approximation for GPR Models . . . . . . . . . . . . . . . . . . . .

6-13

Subset of Regressors Approximation for GPR Models . . . . . . . . . . . . . . . Approximating the Kernel Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predictive Variance Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6-14 6-14 6-15 6-15 6-16

Fully Independent Conditional Approximation for GPR Models . . . . . . . Approximating the Kernel Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6-18 6-18 6-18 6-19

Block Coordinate Descent Approximation for GPR Models . . . . . . . . . . . Fit GPR Models Using BCD Approximation . . . . . . . . . . . . . . . . . . . . . . .

6-21 6-21

Random Number Generation Generating Pseudorandom Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Pseudorandom Number Generation Methods . . . . . . . . . . . . . . .

x

Contents

7-2 7-2

Representing Sampling Distributions Using Markov Chain Samplers . . Using the Metropolis-Hastings Algorithm . . . . . . . . . . . . . . . . . . . . . . . . Using Slice Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Hamiltonian Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7-10 7-10 7-10 7-11

Generating Quasi-Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quasi-Random Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quasi-Random Point Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quasi-Random Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7-13 7-13 7-14 7-19

Generating Data Using Flexible Families of Distributions . . . . . . . . . . . .

7-21

Bayesian Linear Regression Using Hamiltonian Monte Carlo . . . . . . . . .

7-27

8

Hypothesis Tests Hypothesis Test Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8-2

Hypothesis Test Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8-4

Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8-5

Available Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

8-10

Analysis of Variance Introduction to Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-2

One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data for One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perform One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-3 9-3 9-4 9-5 9-9

Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data for Balanced Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . Perform Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-11 9-11 9-12 9-13 9-15

Multiple Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Comparisons Using One-Way ANOVA . . . . . . . . . . . . . . . . . . . . Multiple Comparisons for Three-Way ANOVA . . . . . . . . . . . . . . . . . . . . . Multiple Comparison Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-18 9-18 9-18 9-20 9-22

N-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to N-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data for N-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perform N-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-25 9-25 9-27 9-27

ANOVA with Random Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-33

Other ANOVA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-38

Analysis of Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Analysis of Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis of Covariance Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Confidence Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-39 9-39 9-39 9-43 9-45

Nonparametric Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Nonparametric Methods . . . . . . . . . . . . . . . . . . . . . . . . .

9-47 9-47

xi

10

xii

Contents

Kruskal-Wallis Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Friedman's Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-47 9-47

MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ANOVA with Multiple Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-49 9-49 9-49

Model Specification for Repeated Measures Models . . . . . . . . . . . . . . . . Wilkinson Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-54 9-54

Compound Symmetry Assumption and Epsilon Corrections . . . . . . . . . .

9-55

Mauchly’s Test of Sphericity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-57

Multivariate Analysis of Variance for Repeated Measures . . . . . . . . . . . .

9-59

Bayesian Optimization Bayesian Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gaussian Process Regression for Fitting the Model . . . . . . . . . . . . . . . . . Acquisition Function Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acquisition Function Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10-2 10-2 10-3 10-3 10-5

Parallel Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimize in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Bayesian Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Settings for Best Parallel Performance . . . . . . . . . . . . . . . . . . . . . . . . . . Differences in Parallel Bayesian Optimization Output . . . . . . . . . . . . . . .

10-7 10-7 10-7 10-8 10-9

Bayesian Optimization Plot Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . Built-In Plot Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Custom Plot Function Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create a Custom Plot Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10-11 10-11 10-12 10-12

Bayesian Optimization Output Functions . . . . . . . . . . . . . . . . . . . . . . . . What Is a Bayesian Optimization Output Function? . . . . . . . . . . . . . . . . Built-In Output Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Custom Output Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization Output Function . . . . . . . . . . . . . . . . . . . . . . . .

10-19 10-19 10-19 10-19 10-20

Bayesian Optimization Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Bayesian Optimization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ways to Perform Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization Using a Fit Function . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization Using bayesopt . . . . . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . Parameters Available for Fit Functions . . . . . . . . . . . . . . . . . . . . . . . . . Hyperparameter Optimization Options for Fit Functions . . . . . . . . . . . .

10-25 10-25 10-25 10-26 10-26 10-27 10-28 10-29

11

Variables for a Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntax for Creating Optimization Variables . . . . . . . . . . . . . . . . . . . . . . Variables for Optimization Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .

10-33 10-33 10-34

Bayesian Optimization Objective Functions . . . . . . . . . . . . . . . . . . . . . . Objective Function Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective Function Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objective Function Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10-36 10-36 10-36 10-37

Constraints in Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deterministic Constraints — XConstraintFcn . . . . . . . . . . . . . . . . . . . . Conditional Constraints — ConditionalVariableFcn . . . . . . . . . . . . . . . . Coupled Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bayesian Optimization with Coupled Constraints . . . . . . . . . . . . . . . . .

10-38 10-38 10-38 10-39 10-40 10-41

Optimize a Cross-Validated SVM Classifier Using bayesopt . . . . . . . . . .

10-45

Optimize an SVM Classifier Fit Using Bayesian Optimization . . . . . . . .

10-56

Optimize a Boosted Regression Ensemble . . . . . . . . . . . . . . . . . . . . . . .

10-64

Parametric Regression Analysis Choose a Regression Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Update Legacy Code with New Fitting Methods . . . . . . . . . . . . . . . . . . .

11-2 11-2

What Is a Linear Regression Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-5

Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose a Fitting Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose a Model or Range of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit Model to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examine Quality and Adjust Fitted Model . . . . . . . . . . . . . . . . . . . . . . . Predict or Simulate Responses to New Data . . . . . . . . . . . . . . . . . . . . . Share Fitted Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-8 11-8 11-9 11-10 11-12 11-13 11-30 11-32

Linear Regression Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-34

Regression Using Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-39

Linear Regression Using Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-41

Linear Regression with Interaction Effects . . . . . . . . . . . . . . . . . . . . . . .

11-43

Interpret Linear Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-49

Cook’s Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-54 11-54 11-54

xiii

xiv

Contents

How To . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determine Outliers Using Cook's Distance . . . . . . . . . . . . . . . . . . . . . .

11-54 11-55

Coefficient Standard Errors and Confidence Intervals . . . . . . . . . . . . . . Coefficient Covariance and Standard Errors . . . . . . . . . . . . . . . . . . . . . Coefficient Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-57 11-57 11-58

Coefficient of Determination (R-Squared) . . . . . . . . . . . . . . . . . . . . . . . Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How To . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-60 11-60 11-60 11-60 11-60

Delete-1 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delete-1 Change in Covariance (covratio) . . . . . . . . . . . . . . . . . . . . . . . Delete-1 Scaled Difference in Coefficient Estimates (Dfbetas) . . . . . . . . Delete-1 Scaled Change in Fitted Values (Dffits) . . . . . . . . . . . . . . . . . . Delete-1 Variance (S2_i) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-62 11-62 11-64 11-65 11-67

Durbin-Watson Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How To . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test for Autocorrelation Among Residuals . . . . . . . . . . . . . . . . . . . . . . .

11-69 11-69 11-69 11-69 11-69

F-statistic and t-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Fit of Model Using F-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . t-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Significance of Regression Coefficients Using t-statistic . . . . . . .

11-71 11-71 11-71 11-73 11-74

Hat Matrix and Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hat Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determine High Leverage Observations . . . . . . . . . . . . . . . . . . . . . . . .

11-76 11-76 11-77 11-77

Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How To . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Model Assumptions Using Residuals . . . . . . . . . . . . . . . . . . . . .

11-79 11-79 11-79 11-80 11-80

Summary of Output and Diagnostic Statistics . . . . . . . . . . . . . . . . . . . .

11-88

Wilkinson Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Formula Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Mixed-Effects Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . Generalized Linear Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . Generalized Linear Mixed-Effects Model Examples . . . . . . . . . . . . . . . . Repeated Measures Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . .

11-90 11-90 11-90 11-93 11-94 11-95 11-96 11-97

Stepwise Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stepwise Regression to Select Appropriate Models . . . . . . . . . . . . . . . .

11-98 11-98

Compare large and small stepwise models . . . . . . . . . . . . . . . . . . . . . .

12

11-98

Robust Regression — Reduce Outlier Effects . . . . . . . . . . . . . . . . . . . . What Is Robust Regression? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robust Regression versus Standard Least-Squares Fit . . . . . . . . . . . .

11-103 11-103 11-103

Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-106 11-106 11-106

Lasso and Elastic Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Lasso and Elastic Net? . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lasso and Elastic Net Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-109 11-109 11-109 11-110

Wide Data via Lasso and Parallel Computing . . . . . . . . . . . . . . . . . . . .

11-112

Lasso Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-117

Lasso and Elastic Net with Cross Validation . . . . . . . . . . . . . . . . . . . . .

11-120

Partial Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Partial Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . Partial Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-123 11-123 11-123

Linear Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-128

Prepare Data for Linear Mixed-Effects Models . . . . . . . . . . . . . . . . . . . Tables and Dataset Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relation of Matrix Form to Tables and Dataset Arrays . . . . . . . . . . . . .

11-131 11-131 11-132 11-134

Relationship Between Formula and Design Matrices . . . . . . . . . . . . . . Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design Matrices for Fixed and Random Effects . . . . . . . . . . . . . . . . . . Grouping Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-135 11-135 11-136 11-138

Estimating Parameters in Linear Mixed-Effects Models . . . . . . . . . . . Maximum Likelihood (ML) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Restricted Maximum Likelihood (REML) . . . . . . . . . . . . . . . . . . . . . . .

11-140 11-140 11-141

Linear Mixed-Effects Model Workflow . . . . . . . . . . . . . . . . . . . . . . . . . .

11-143

Fit Mixed-Effects Spline Regression . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-155

Train Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11-158

Generalized Linear Models Multinomial Models for Nominal Responses . . . . . . . . . . . . . . . . . . . . . .

12-2

xv

13

Multinomial Models for Ordinal Responses . . . . . . . . . . . . . . . . . . . . . . .

12-4

Hierarchical Multinomial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-7

Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Generalized Linear Models? . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Generalized Linear Model and Link Function . . . . . . . . . . . . . . Choose Fitting Method and Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit Model to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examine Quality and Adjust the Fitted Model . . . . . . . . . . . . . . . . . . . . Predict or Simulate Responses to New Data . . . . . . . . . . . . . . . . . . . . . Share Fitted Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-9 12-9 12-9 12-11 12-13 12-15 12-16 12-23 12-26

Generalized Linear Model Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-28

Lasso Regularization of Generalized Linear Models . . . . . . . . . . . . . . . What is Generalized Linear Model Lasso Regularization? . . . . . . . . . . . Generalized Linear Model Lasso and Elastic Net . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-32 12-32 12-32 12-33

Regularize Poisson Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-34

Regularize Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-36

Regularize Wide Data in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-43

Generalized Linear Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . What Are Generalized Linear Mixed-Effects Models? . . . . . . . . . . . . . . GLME Model Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare Data for Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose a Distribution Type for the Model . . . . . . . . . . . . . . . . . . . . . . . Choose a Link Function for the Model . . . . . . . . . . . . . . . . . . . . . . . . . . Specify the Model Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Work with the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-48 12-48 12-48 12-49 12-50 12-50 12-51 12-53 12-55

Fit a Generalized Linear Mixed-Effects Model . . . . . . . . . . . . . . . . . . . .

12-57

Nonlinear Regression Nonlinear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Parametric Nonlinear Regression Models? . . . . . . . . . . . . . . . . Prepare Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Represent the Nonlinear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Initial Vector beta0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit Nonlinear Model to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examine Quality and Adjust the Fitted Nonlinear Model . . . . . . . . . . . . . Predict or Simulate Responses Using a Nonlinear Model . . . . . . . . . . . . Nonlinear Regression Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xvi

Contents

13-2 13-2 13-2 13-3 13-5 13-6 13-6 13-9 13-12

14

Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . Mixed-Effects Model Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Covariate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choosing nlmefit or nlmefitsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Output Functions with Mixed-Effects Models . . . . . . . . . . . . . . .

13-17 13-17 13-17 13-18 13-20 13-21 13-23

Examining Residuals for Model Verification . . . . . . . . . . . . . . . . . . . . . .

13-28

Mixed-Effects Models Using nlmefit and nlmefitsa . . . . . . . . . . . . . . . .

13-33

Survival Analysis What Is Survival Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Survivor Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

14-2 14-2 14-2 14-2 14-4 14-6

Kaplan-Meier Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14-10

Hazard and Survivor Functions for Different Groups . . . . . . . . . . . . . . .

14-16

Survivor Functions for Two Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14-22

Cox Proportional Hazards Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hazard Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extension of Cox Proportional Hazards Model . . . . . . . . . . . . . . . . . . . Partial Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partial Likelihood Function for Tied Events . . . . . . . . . . . . . . . . . . . . . . Frequency or Weights of Observations . . . . . . . . . . . . . . . . . . . . . . . . .

14-26 14-26 14-26 14-27 14-27 14-28 14-29

Cox Proportional Hazards Model for Censored Data . . . . . . . . . . . . . . .

14-31

Cox Proportional Hazards Model with Time-Dependent Covariates . . .

14-35

Multivariate Methods Multivariate Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Multivariate Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . Solving Multivariate Regression Problems . . . . . . . . . . . . . . . . . . . . . . .

15-2 15-2 15-2 15-3

xvii

Estimation of Multivariate Regression Models . . . . . . . . . . . . . . . . . . . . . Least Squares Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missing Response Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xviii

Contents

15-5 15-5 15-7 15-9

Set Up Multivariate Regression Problems . . . . . . . . . . . . . . . . . . . . . . . Response Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Multivariate Regression Problems . . . . . . . . . . . . . . . . . . . . .

15-11 15-11 15-14 15-14

Multivariate General Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-20

Fixed Effects Panel Model with Concurrent Correlation . . . . . . . . . . . .

15-24

Longitudinal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-30

Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-35

Nonclassical and Nonmetric Multidimensional Scaling . . . . . . . . . . . . Nonclassical Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . Nonmetric Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . .

15-36 15-36 15-37

Classical Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-40

Procrustes Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare Landmark Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preprocess Data for Accurate Results . . . . . . . . . . . . . . . . . . . . . . . . . .

15-42 15-42 15-42 15-43

Compare Handwritten Shapes Using Procrustes Analysis . . . . . . . . . . .

15-44

Introduction to Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feature Selection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feature Selection Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-49 15-49 15-50

Sequential Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Sequential Feature Selection . . . . . . . . . . . . . . . . . . . . Select Subset of Features with Comparative Predictive Power . . . . . . .

15-61 15-61 15-61

Nonnegative Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-65

Perform Nonnegative Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . .

15-66

Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-68

Analyze Quality of Life in U.S. Cities Using PCA . . . . . . . . . . . . . . . . . . .

15-69

Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-78

Analyze Stock Prices Using Factor Analysis . . . . . . . . . . . . . . . . . . . . . .

15-79

Robust Feature Selection Using NCA for Regression . . . . . . . . . . . . . . .

15-85

Neighborhood Component Analysis (NCA) Feature Selection . . . . . . . . 15-99 NCA Feature Selection for Classification . . . . . . . . . . . . . . . . . . . . . . . . 15-99 NCA Feature Selection for Regression . . . . . . . . . . . . . . . . . . . . . . . . 15-101 Impact of Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-102 Choosing the Regularization Parameter Value . . . . . . . . . . . . . . . . . . . 15-102

16

t-SNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is t-SNE? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . t-SNE Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barnes-Hut Variation of t-SNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of t-SNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-104 15-104 15-104 15-107 15-107

t-SNE Output Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . t-SNE Output Function Description . . . . . . . . . . . . . . . . . . . . . . . . . . tsne optimValues Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . t-SNE Custom Output Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-110 15-110 15-110 15-111

Visualize High-Dimensional Data Using t-SNE . . . . . . . . . . . . . . . . . . .

15-113

tsne Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-117

Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Feature Extraction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sparse Filtering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconstruction ICA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-130 15-130 15-130 15-132

Feature Extraction Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-135

Extract Mixed Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15-164

Cluster Analysis Choose Cluster Analysis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-2 16-2 16-4

Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-6 Introduction to Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 16-6 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-6 Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-7 Linkages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-8 Dendrograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-9 Verify the Cluster Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-10 Create Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-15 DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determine Values for DBSCAN Parameters . . . . . . . . . . . . . . . . . . . . . .

16-19 16-19 16-19 16-20

xix

17

18

Partition Data Using Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimate Number of Clusters and Perform Spectral Clustering . . . . . . .

16-26 16-26 16-26 16-27

k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare k-Means Clustering Solutions . . . . . . . . . . . . . . . . . . . . . . . .

16-33 16-33 16-33

Cluster Using Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . How Gaussian Mixture Models Cluster Data . . . . . . . . . . . . . . . . . . . . . Fit GMM with Different Covariance Options and Initial Conditions . . . . When to Regularize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Fit Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-39 16-39 16-39 16-44 16-45

Cluster Gaussian Mixture Data Using Hard Clustering . . . . . . . . . . . . .

16-46

Cluster Gaussian Mixture Data Using Soft Clustering . . . . . . . . . . . . . .

16-52

Tune Gaussian Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-57

Cluster Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16-63

Parametric Classification Parametric Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17-2

Performance Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Performance Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . What are ROC Curves? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluate Classifier Performance Using perfcurve . . . . . . . . . . . . . . . . . .

17-3 17-3 17-3 17-3

Nonparametric Supervised Learning Supervised Learning Workflow and Algorithms . . . . . . . . . . . . . . . . . . . . What is Supervised Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steps in Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of Classification Algorithms . . . . . . . . . . . . . . . . . . . . . .

18-2 18-2 18-3 18-6

Visualize Decision Surfaces of Different Classifiers . . . . . . . . . . . . . . . . .

18-8

Classification Using Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . Pairwise Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . k-Nearest Neighbor Search and Radius Search . . . . . . . . . . . . . . . . . . . Classify Query Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Find Nearest Neighbors Using a Custom Distance Metric . . . . . . . . . . . K-Nearest Neighbor Classification for Supervised Learning . . . . . . . . .

xx

Contents

18-11 18-11 18-13 18-17 18-23 18-26

Construct KNN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examine Quality of KNN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predict Classification Using KNN Classifier . . . . . . . . . . . . . . . . . . . . . Modify KNN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-27 18-27 18-28 18-28

Framework for Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare the Predictor Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prepare the Response Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose an Applicable Ensemble Aggregation Method . . . . . . . . . . . . . . Set the Number of Ensemble Members . . . . . . . . . . . . . . . . . . . . . . . . . Prepare the Weak Learners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Call fitcensemble or fitrensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-30 18-31 18-31 18-31 18-34 18-34 18-36

Ensemble Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bootstrap Aggregation (Bagging) and Random Forest . . . . . . . . . . . . . . Random Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boosting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-38 18-41 18-45 18-45

Train Classification Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-53

Train Regression Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-56

Select Predictors for Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-59

Test Ensemble Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-65

Ensemble Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regularize a Regression Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-69 18-69

Classification with Imbalanced Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-78

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Ensemble With Unequal Classification Costs . . . . . . . . . . . . . . . .

18-83 18-84

Surrogate Splits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-90

LPBoost and TotalBoost for Small Ensembles . . . . . . . . . . . . . . . . . . . .

18-95

Tune RobustBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-100

Random Subspace Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-103

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger ....................................................... 18-108 Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-119

Detect Outliers Using Quantile Regression . . . . . . . . . . . . . . . . . . . . .

18-132

Conditional Quantile Estimation Using Kernel Smoothing . . . . . . . . .

18-137

xxi

19

xxii

Contents

Tune Random Forest Using Quantile Error and Bayesian Optimization .......................................................

18-140

Support Vector Machines for Binary Classification . . . . . . . . . . . . . . . Understanding Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . Using Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train SVM Classifiers Using a Gaussian Kernel . . . . . . . . . . . . . . . . . . Train SVM Classifier Using Custom Kernel . . . . . . . . . . . . . . . . . . . . . Optimize an SVM Classifier Fit Using Bayesian Optimization . . . . . . . Plot Posterior Probability Regions for SVM Classification Models . . . . Analyze Images Using Linear Support Vector Machines . . . . . . . . . . .

18-145 18-145 18-149 18-151 18-154 18-158 18-165 18-167

Moving Towards Automating Model Selection Using Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-172

Automated Classifier Selection with Bayesian Optimization . . . . . . . .

18-180

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18-189

Decision Trees Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Classification Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Regression Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-2 19-2 19-2

View Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-4

Growing Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-7

Prediction Using Classification and Regression Trees . . . . . . . . . . . . . . .

19-9

Predict Out-of-Sample Responses of Subtrees . . . . . . . . . . . . . . . . . . . .

19-10

Improving Classification Trees and Regression Trees . . . . . . . . . . . . . . Examining Resubstitution Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Split Predictor Selection Technique . . . . . . . . . . . . . . . . . . . . . Control Depth or “Leafiness” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19-13 19-13 19-13 19-14 19-15 19-19

Splitting Categorical Predictors in Classification Trees . . . . . . . . . . . . Challenges in Splitting Multilevel Predictors . . . . . . . . . . . . . . . . . . . . Algorithms for Categorical Predictor Split . . . . . . . . . . . . . . . . . . . . . . Inspect Data with Multilevel Categorical Predictors . . . . . . . . . . . . . . .

19-25 19-25 19-25 19-26

20

Discriminant Analysis Discriminant Analysis Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Discriminant Analysis Classifiers . . . . . . . . . . . . . . . . . . . . . . . . .

20-2 20-2

Creating Discriminant Analysis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . Weighted Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20-4 20-4

Prediction Using Discriminant Analysis Models . . . . . . . . . . . . . . . . . . . . Posterior Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prior Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20-6 20-6 20-6 20-7

Create and Visualize Discriminant Analysis Classifier . . . . . . . . . . . . . . .

20-9

Improving Discriminant Analysis Models . . . . . . . . . . . . . . . . . . . . . . . . Deal with Singular Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose a Discriminant Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examine the Resubstitution Error and Confusion Matrix . . . . . . . . . . . . Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Change Costs and Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20-15 20-15 20-15 20-16 20-17 20-18

Regularize Discriminant Analysis Classifier . . . . . . . . . . . . . . . . . . . . . .

20-21

Examine the Gaussian Mixture Assumption . . . . . . . . . . . . . . . . . . . . . . 20-27 Bartlett Test of Equal Covariance Matrices for Linear Discriminant Analysis .................................................... 20-27 Q-Q Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-29 Mardia Kurtosis Test of Multivariate Normality . . . . . . . . . . . . . . . . . . . 20-31

21

Naive Bayes Naive Bayes Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Supported Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21-2 21-2

Plot Posterior Classification Probabilities . . . . . . . . . . . . . . . . . . . . . . . . .

21-5

xxiii

22

23

Classification and Regression for High-Dimensional Data

Classification Learner Machine Learning in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selecting the Right Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Classification Models in Classification Learner App . . . . . . . . . . . . Train Regression Models in Regression Learner App . . . . . . . . . . . . . . . Train Neural Networks for Deep Learning . . . . . . . . . . . . . . . . . . . . . . .

xxiv

Contents

23-2 23-2 23-3 23-6 23-7 23-8

Train Classification Models in Classification Learner App . . . . . . . . . . . Automated Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manual Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare and Improve Classification Models . . . . . . . . . . . . . . . . . . . . .

23-10 23-10 23-12 23-13 23-14

Select Data and Validation for Classification Problem . . . . . . . . . . . . . . Select Data from Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import Data from File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example Data for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Validation Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-16 23-16 23-17 23-17 23-18

Choose Classifier Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose a Classifier Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest Neighbor Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ensemble Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naive Bayes Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-20 23-20 23-23 23-26 23-27 23-27 23-30 23-33 23-35

Feature Selection and Feature Transformation Using Classification Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Investigate Features in the Scatter Plot . . . . . . . . . . . . . . . . . . . . . . . . Select Features to Include . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transform Features with PCA in Classification Learner . . . . . . . . . . . . . Investigate Features in the Parallel Coordinates Plot . . . . . . . . . . . . . .

23-37 23-37 23-38 23-39 23-40

Misclassification Costs in Classification Learner App . . . . . . . . . . . . . . Specify Misclassification Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assess Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Misclassification Costs in Exported Model and Generated Code . . . . . .

23-42 23-42 23-45 23-46

Hyperparameter Optimization in Classification Learner App . . . . . . . . Select Hyperparameters to Optimize . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimum Classification Error Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-48 23-48 23-53 23-55

Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-56

Assess Classifier Performance in Classification Learner . . . . . . . . . . . . Check Performance in the History List . . . . . . . . . . . . . . . . . . . . . . . . . Plot Classifier Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Check Performance Per Class in the Confusion Matrix . . . . . . . . . . . . . Check the ROC Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-59 23-59 23-60 23-61 23-62

Export Plots in Classification Learner App . . . . . . . . . . . . . . . . . . . . . . .

23-64

Export Classification Model to Predict New Data . . . . . . . . . . . . . . . . . . Export the Model to the Workspace to Make Predictions for New Data .................................................... Make Predictions for New Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate MATLAB Code to Train the Model with New Data . . . . . . . . . Generate C Code for Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deploy Predictions Using MATLAB Compiler . . . . . . . . . . . . . . . . . . . .

23-69 23-69 23-69 23-70 23-71 23-73

Train Decision Trees Using Classification Learner App . . . . . . . . . . . . .

23-74

Train Discriminant Analysis Classifiers Using Classification Learner App ........................................................ 23-83

24

Train Logistic Regression Classifiers Using Classification Learner App ........................................................

23-86

Train Support Vector Machines Using Classification Learner App . . . .

23-89

Train Nearest Neighbor Classifiers Using Classification Learner App .

23-92

Train Ensemble Classifiers Using Classification Learner App . . . . . . . .

23-95

Train Naive Bayes Classifiers Using Classification Learner App . . . . . .

23-98

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-108

Train Classifier Using Hyperparameter Optimization in Classification Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23-116

Regression Learner Train Regression Models in Regression Learner App . . . . . . . . . . . . . . . . Automated Regression Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . Manual Regression Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Regression Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compare and Improve Regression Models . . . . . . . . . . . . . . . . . . . . . . .

24-2 24-2 24-4 24-4 24-5

Select Data and Validation for Regression Problem . . . . . . . . . . . . . . . . . Select Data from Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import Data from File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-8 24-8 24-9

xxv

Example Data for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Validation Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-9 24-10

Choose Regression Model Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choose Regression Model Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gaussian Process Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . Ensembles of Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-12 24-12 24-14 24-15 24-17 24-20 24-22

Feature Selection and Feature Transformation Using Regression Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-24 Investigate Features in the Response Plot . . . . . . . . . . . . . . . . . . . . . . . 24-24 Select Features to Include . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-25 Transform Features with PCA in Regression Learner . . . . . . . . . . . . . . 24-26

25

Hyperparameter Optimization in Regression Learner App . . . . . . . . . . Select Hyperparameters to Optimize . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimum MSE Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-27 24-27 24-33 24-34 24-36

Assess Model Performance in Regression Learner . . . . . . . . . . . . . . . . . Check Performance in History List . . . . . . . . . . . . . . . . . . . . . . . . . . . . View Model Statistics in Current Model Window . . . . . . . . . . . . . . . . . . Explore Data and Results in Response Plot . . . . . . . . . . . . . . . . . . . . . . Plot Predicted vs. Actual Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluate Model Using Residuals Plot . . . . . . . . . . . . . . . . . . . . . . . . . .

24-39 24-39 24-40 24-40 24-42 24-43

Export Plots in Regression Learner App . . . . . . . . . . . . . . . . . . . . . . . . .

24-46

Export Regression Model to Predict New Data . . . . . . . . . . . . . . . . . . . . Export Model to Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Make Predictions for New Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate MATLAB Code to Train Model with New Data . . . . . . . . . . . . Deploy Predictions Using MATLAB Compiler . . . . . . . . . . . . . . . . . . . .

24-50 24-50 24-50 24-51 24-52

Train Regression Trees Using Regression Learner App . . . . . . . . . . . . .

24-54

Train Regression Model Using Hyperparameter Optimization in Regression Learner App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24-63

Support Vector Machines Understanding Support Vector Machine Regression . . . . . . . . . . . . . . . . Mathematical Formulation of SVM Regression . . . . . . . . . . . . . . . . . . . . Solving the SVM Regression Optimization Problem . . . . . . . . . . . . . . . . .

xxvi

Contents

25-2 25-2 25-5

26

27

Markov Models Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26-2

Hidden Markov Models (HMM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Hidden Markov Models (HMM) . . . . . . . . . . . . . . . . . . . . Analyzing Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26-4 26-4 26-5

Design of Experiments Design of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27-2

Full Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multilevel Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two-Level Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27-3 27-3 27-3

Fractional Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Fractional Factorial Designs . . . . . . . . . . . . . . . . . . . . . . Plackett-Burman Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General Fractional Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27-5 27-5 27-5 27-5

Response Surface Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-8 Introduction to Response Surface Designs . . . . . . . . . . . . . . . . . . . . . . . 27-8 Central Composite Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-8 Box-Behnken Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-10 D-Optimal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to D-Optimal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate D-Optimal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Augment D-Optimal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Fixed Covariate Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Categorical Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Candidate Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27-12 27-12 27-13 27-14 27-15 27-16 27-16

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques ........................................................ 27-19

28

Statistical Process Control Control Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28-2

Capability Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28-5

xxvii

29

30

xxviii

Contents

Tall Arrays Logistic Regression with Tall Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29-2

Bayesian Optimization with Tall Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . .

29-9

Parallel Statistics Quick Start Parallel Computing for Statistics and Machine Learning Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Parallel Statistics Functionality? . . . . . . . . . . . . . . . . . . . . . . . . How To Compute in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30-2 30-2 30-4

Use Parallel Processing for Regression TreeBagger Workflow . . . . . . . .

30-6

Concepts of Parallel Computing in Statistics and Machine Learning Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subtleties in Parallel Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vocabulary for Parallel Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30-8 30-8 30-8

When to Run Statistical Functions in Parallel . . . . . . . . . . . . . . . . . . . . . Why Run in Parallel? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors Affecting Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors Affecting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30-9 30-9 30-9 30-9

Working with parfor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How Statistical Functions Use parfor . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of parfor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30-11 30-11 30-11

Reproducibility in Parallel Statistical Computations . . . . . . . . . . . . . . . Issues and Considerations in Reproducing Parallel Computations . . . . . Running Reproducible Parallel Computations . . . . . . . . . . . . . . . . . . . . Parallel Statistical Computation Using Random Numbers . . . . . . . . . . .

30-13 30-13 30-13 30-14

Implement Jackknife Using Parallel Computing . . . . . . . . . . . . . . . . . .

30-17

Implement Cross-Validation Using Parallel Computing . . . . . . . . . . . . . Simple Parallel Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducible Parallel Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . .

30-18 30-18 30-18

Implement Bootstrap Using Parallel Computing . . . . . . . . . . . . . . . . . . Bootstrap in Serial and Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducible Parallel Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30-20 30-20 30-21

31

Code Generation Introduction to Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Code Generation Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Code Generation Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31-2 31-2 31-4

General Code Generation Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Define Entry-Point Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Verify Generated Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31-5 31-5 31-5 31-7

Code Generation for Prediction of Machine Learning Model at Command Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-9 Code Generation for Nearest Neighbor Searcher . . . . . . . . . . . . . . . . . .

31-13

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31-16

Code Generation and Classification Learner App . . . . . . . . . . . . . . . . . . Load Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enable PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Train Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Export Model to Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate C Code for Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31-25 31-25 31-26 31-26 31-29 31-30

Predict Class Labels Using MATLAB Function Block . . . . . . . . . . . . . . .

31-34

Specify Variable-Size Arguments for Code Generation . . . . . . . . . . . . .

31-38

Train SVM Classifier with Categorical Predictors and Generate C/C++ Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31-43

System Objects for Classification and Code Generation . . . . . . . . . . . .

31-47

Predict Class Labels Using Stateflow . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31-55

Human Activity Recognition Simulink Model for Smartphone Deployment ........................................................ 31-59 Code Generation for Prediction and Update Using Coder Configurer .

31-68

Code Generation for Probability Distribution Objects . . . . . . . . . . . . . .

31-70

Fixed-Point Code Generation for Prediction of SVM . . . . . . . . . . . . . . .

31-75

Generate Code to Classify Numeric Data in Table . . . . . . . . . . . . . . . . .

31-88

xxix

32

A

Functions

Sample Data Sets Sample Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B

xxx

Contents

A-2

Probability Distributions Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-2 B-2 B-2 B-2 B-2 B-2 B-3 B-4

Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-6 B-6 B-6 B-6 B-8 B-8

Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-10 B-10 B-10 B-10 B-11 B-11 B-11 B-16

Birnbaum-Saunders Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-18 B-18 B-18 B-18

Burr Type XII Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fit a Burr Distribution and Draw the cdf . . . . . . . . . . . . . . . . . . . . . . . . Compare Lognormal and Burr Distribution pdfs . . . . . . . . . . . . . . . . . . . Burr pdf for Various Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-19 B-19 B-19 B-20 B-21 B-22 B-23

Survival and Hazard Functions of Burr Distribution . . . . . . . . . . . . . . . . Divergence of Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-25 B-26

Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-28 B-28 B-28 B-28 B-29 B-29 B-29 B-29 B-31

Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-33 B-33 B-33 B-34 B-34 B-34 B-34 B-35 B-38

Extreme Value Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-40 B-40 B-40 B-42 B-43

F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-45 B-45 B-45 B-45

Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-47 B-47 B-47 B-48 B-48 B-49 B-49 B-49 B-53

Generalized Extreme Value Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-55 B-55 B-55 B-56 B-57

Generalized Pareto Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-59 B-59 B-59 B-60 B-61

xxxi

xxxii

Contents

Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-63 B-63 B-63 B-63 B-64 B-64 B-64 B-64 B-66

Half-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-68 B-68 B-68 B-68 B-70 B-71 B-72

Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-73 B-73 B-73 B-73

Inverse Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-75 B-75 B-75 B-75

Inverse Wishart Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-76 B-76 B-76 B-76

Kernel Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel Density Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel Smoothing Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-78 B-78 B-78 B-78 B-82

Logistic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-85 B-85 B-85 B-85 B-85

Loglogistic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-86 B-86 B-86 B-86 B-86

Lognormal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-88 B-88 B-88 B-89

Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-89 B-89 B-94

Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-96 B-96 B-96 B-96 B-96 B-97

Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-98 B-98 B-98 B-98 B-99 B-99

Multivariate t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-104 B-104 B-104 B-104

Nakagami Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-108 B-108 B-108 B-108

Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-109 B-109 B-109 B-109 B-111

Noncentral Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-113 B-113 B-113 B-113

Noncentral F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-115 B-115 B-115 B-115

Noncentral t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-117 B-117 B-117 B-117

Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-119 B-119 B-119 B-120 B-120 B-121

xxxiii

xxxiv

Contents

Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-127

Piecewise Linear Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-130 B-130 B-130 B-130 B-130

Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-131 B-131 B-131 B-131 B-132 B-132 B-135

Rayleigh Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-137 B-137 B-137 B-137 B-137

Rician Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-139 B-139 B-139 B-139

Stable Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-140 B-140 B-140 B-141 B-143 B-145 B-146

Student's t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-149 B-149 B-149 B-149 B-150 B-150 B-150 B-150 B-153

t Location-Scale Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship to Other Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-156 B-156 B-156 B-156 B-157 B-157 B-157

Triangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-158 B-158

C

Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-158 B-158 B-160 B-161

Uniform Distribution (Continuous) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-163 B-163 B-163 B-164 B-164 B-164 B-164 B-164 B-167

Uniform Distribution (Discrete) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-168 B-168 B-168 B-168

Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inverse Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-170 B-170 B-170 B-171 B-171 B-171 B-172 B-172 B-176

Wishart Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B-178 B-178 B-178 B-178 B-178

Bibliography Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

C-2

xxxv

1 Getting Started • “Statistics and Machine Learning Toolbox Product Description” on page 1-2 • “Supported Data Types” on page 1-3

1

Getting Started

Statistics and Machine Learning Toolbox Product Description Analyze and model data using statistics and machine learning Statistics and Machine Learning Toolbox provides functions and apps to describe, analyze, and model data. You can use descriptive statistics and plots for exploratory data analysis, fit probability distributions to data, generate random numbers for Monte Carlo simulations, and perform hypothesis tests. Regression and classification algorithms let you draw inferences from data and build predictive models. For multidimensional data analysis, Statistics and Machine Learning Toolbox provides feature selection, stepwise regression, principal component analysis (PCA), regularization, and other dimensionality reduction methods that let you identify variables or features that impact your model. The toolbox provides supervised and unsupervised machine learning algorithms, including support vector machines (SVMs), boosted and bagged decision trees, k-nearest neighbor, k-means, k-medoids, hierarchical clustering, Gaussian mixture models, and hidden Markov models. Many of the statistics and machine learning algorithms can be used for computations on data sets that are too big to be stored in memory.

Key Features • Regression techniques, including linear, generalized linear, nonlinear, robust, regularized, ANOVA, repeated measures, and mixed-effects models • Big data algorithms for dimension reduction, descriptive statistics, k-means clustering, linear regression, logistic regression, and discriminant analysis • Univariate and multivariate probability distributions, random and quasi-random number generators, and Markov chain samplers • Hypothesis tests for distributions, dispersion, and location, and design of experiments (DOE) techniques for optimal, factorial, and response surface designs • Classification Learner app and algorithms for supervised machine learning, including support vector machines (SVMs), boosted and bagged decision trees, k-nearest neighbor, Naïve Bayes, discriminant analysis, and Gaussian process regression • Unsupervised machine learning algorithms, including k-means, k-medoids, hierarchical clustering, Gaussian mixtures, and hidden Markov models • Bayesian optimization for tuning machine learning algorithms by searching for optimal hyperparameters

1-2

Supported Data Types

Supported Data Types Statistics and Machine Learning Toolbox supports the following data types for input arguments: • Numeric scalars, vectors, matrices, or arrays having single- or double-precision entries. These data forms have data type single or double. Examples include response variables, predictor variables, and numeric values. • Cell arrays of character vectors; character, string, logical, or categorical arrays; or numeric vectors for categorical variables representing grouping data. These data forms have data types cell (specifically cellstr), char, string, logical, categorical, and single or double, respectively. An example is an array of class labels in machine learning. • You can also use nominal or ordinal arrays for categorical data. However, the nominal and ordinal data types are not recommended. To work with nominal or ordinal categorical data, use the categorical data type instead. • You can use signed or unsigned integers, e.g., int8 or uint8. However: • Estimation functions might not support signed or unsigned integer data types for nongrouping data. • If you recast a single or double numeric vector containing NaN values to a signed or unsigned integer, then the software converts the NaN elements to 0. • Some functions support tabular arrays for heterogeneous data (for details, see “Tables” (MATLAB)). The table data type contains variables of any of the data types previously listed. An example is mixed categorical and numerical predictor data for regression analysis. • For some functions, you can also use dataset arrays for heterogeneous data. However, the dataset data type is not recommended. To work with heterogeneous data, use the table data type if the estimation function supports it. • Functions that do not support the table data type support sample data of type single or double, e.g., matrices. • Some functions accept gpuArray input arguments so that they execute on the GPU. For the full list of Statistics and Machine Learning Toolbox functions that accept GPU arrays, see Function List (GPU Arrays). • Some functions accept tall array input arguments to work with large data sets. For the full list of Statistics and Machine Learning Toolbox functions that accept tall arrays, see Function List (Tall Arrays). • Some functions accept sparse matrices, i.e., matrix A such that issparse(A) returns 1. For functions that do not accept sparse matrices, recast the data to a full matrix by using full. Statistics and Machine Learning Toolbox does not support the following data types: • Complex numbers. • Custom numeric data types, e.g., a variable that is double precision and an object. • Signed or unsigned numeric integers for nongrouping data, e.g., unint8 and int16. Note If you specify data of an unsupported type, then the software might return an error or unexpected results.

1-3

2 Organizing Data • “Other MATLAB Functions Supporting Nominal and Ordinal Arrays” on page 2-2 • “Create Nominal and Ordinal Arrays” on page 2-3 • “Change Category Labels” on page 2-7 • “Reorder Category Levels” on page 2-9 • “Categorize Numeric Data” on page 2-13 • “Merge Category Levels” on page 2-16 • “Add and Drop Category Levels” on page 2-18 • “Plot Data Grouped by Category” on page 2-21 • “Test Differences Between Category Means” on page 2-25 • “Summary Statistics Grouped by Category” on page 2-32 • “Sort Ordinal Arrays” on page 2-34 • “Nominal and Ordinal Arrays” on page 2-36 • “Advantages of Using Nominal and Ordinal Arrays” on page 2-38 • “Index and Search Using Nominal and Ordinal Arrays” on page 2-41 • “Grouping Variables” on page 2-45 • “Dummy Variables” on page 2-48 • “Linear Regression with Categorical Covariates” on page 2-52 • “Create a Dataset Array from Workspace Variables” on page 2-57 • “Create a Dataset Array from a File” on page 2-62 • “Add and Delete Observations” on page 2-68 • “Add and Delete Variables” on page 2-71 • “Access Data in Dataset Array Variables” on page 2-74 • “Select Subsets of Observations” on page 2-79 • “Sort Observations in Dataset Arrays” on page 2-82 • “Merge Dataset Arrays” on page 2-85 • “Stack or Unstack Dataset Arrays” on page 2-88 • “Calculations on Dataset Arrays” on page 2-92 • “Export Dataset Arrays” on page 2-95 • “Clean Messy and Missing Data” on page 2-97 • “Dataset Arrays in the Variables Editor” on page 2-101 • “Dataset Arrays” on page 2-112 • “Index and Search Dataset Arrays” on page 2-114

2

Organizing Data

Other MATLAB Functions Supporting Nominal and Ordinal Arrays Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead. Notable functions that operate on nominal and ordinal arrays are listed in nominal and ordinal. In addition to these, many other functions in MATLAB operate on nominal and ordinal arrays in much the same way that they operate on other arrays. A few functions might exhibit special behavior when operating on nominal and ordinal arrays: • If multiple input arguments are nominal or ordinal arrays, the function often requires that they have the same set of categories, including order if ordinal. • Relational functions, such as max and gt, require that the input arrays be ordinal. The following table lists MATLAB functions that operate on nominal and ordinal arrays in addition to other arrays. size length ndims numel

isequal isequaln eq ne lt le ge gt

isrow iscolumn cat horzcat vertcat

min max median mode

See Also nominal | ordinal

2-2

intersect ismember setdiff setxor unique union

histogram pietimes sort sortrows issorted permute reshape transpose ctranspose

double single int8 int16 int32 int64 uint8 uint16 uint32 uint64 char cellstr

Create Nominal and Ordinal Arrays

Create Nominal and Ordinal Arrays Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead. In this section... “Create Nominal Arrays” on page 2-3 “Create Ordinal Arrays” on page 2-4

Create Nominal Arrays This example shows how to create nominal arrays using nominal. Load sample data. The variable species is a 150-by-1 cell array of character vectors containing the species name for each observation. The unique species types are setosa, versicolor, and virginica. load fisheriris unique(species) ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }

Create a nominal array. Convert species to a nominal array using the categories occurring in the data. speciesNom = nominal(species); class(speciesNom) ans = 'nominal'

Explore category levels. The nominal array, speciesNom, has three levels corresponding to the three unique species. The levels of a nominal array are the set of possible values that its elements can take. getlevels(speciesNom) ans = 1x3 nominal setosa versicolor

virginica

A nominal array can have more levels than actually appear in the data. For example, a nominal array named AllSizes might have levels small, medium, and large, but you might only have observations that are medium and large in your data. To see which levels of a nominal array are actually present in the data, use unique, for instance, unique(AllSizes). 2-3

2

Organizing Data

Explore category labels. Each level has a label. By default, nominal labels the category levels with the values occurring in the data. For speciesNom, these labels are the species types. getlabels(speciesNom) ans = 1x3 cell {'setosa'}

{'versicolor'}

{'virginica'}

Specify your own category labels. You can specify your own labels for each category level. You can specify labels when you create the nominal array. speciesNom2 = nominal(species,{'seto','vers','virg'}); getlabels(speciesNom2) ans = 1x3 cell {'seto'}

{'vers'}

{'virg'}

You can also change category labels on an existing nominal array using setlabels Verify new category labels. Verify that the new labels correspond to the original labels in speciesNom. isequal(speciesNom=='setosa',speciesNom2=='seto') ans = logical 1

The logical value 1 indicates that the two labels, 'setosa' and 'seto', correspond to the same observations.

Create Ordinal Arrays This example shows how to create ordinal arrays using ordinal. Load sample data. AllSizes = {'medium','large','small','small','medium',... 'large','medium','small'};

The created variable, AllSizes, is a cell array of character vectors containing size measurements on eight objects. Create an ordinal array. Create an ordinal array with category levels and labels corresponding to the values in the cell array (the default levels and labels). sizeOrd = ordinal(AllSizes); getlevels(sizeOrd)

2-4

Create Nominal and Ordinal Arrays

ans = 1x3 ordinal large medium

small

Explore category labels. By default, ordinal uses the original character vectors as category labels. The default order of the categories is ascending alphabetical order. getlabels(sizeOrd) ans = 1x3 cell {'large'}

{'medium'}

{'small'}

Add additional categories. Suppose that you want to include additional levels for the ordinal array, xsmall and xlarge, even though they do not occur in the original data. To specify additional levels, use the third input argument to ordinal. sizeOrd2 = ordinal(AllSizes,{},... {'xsmall','small','medium','large','xlarge'}); getlevels(sizeOrd2) ans = 1x5 ordinal xsmall small

medium

large

xlarge

Explore category labels. To see which levels are actually present in the data, use unique. unique(sizeOrd2) ans = 1x3 ordinal small medium

large

Specify the category order. Convert AllSizes to an ordinal array with categories small < medium < large. Generally, an ordinal array is distinct from a nominal array because there is a natural ordering for levels of an ordinal array. You can use the third input argument to ordinal to specify the ascending order of the levels. Here, the order of the levels is smallest to largest. sizeOrd = ordinal(AllSizes,{},{'small','medium','large'}); getlevels(sizeOrd) ans = 1x3 ordinal small medium

large

The second input argument for ordinal is a list of labels for the category levels. When you use braces {} for the level labels, ordinal uses the labels specified in the third input argument (the labels come from the levels present in the data if only one input argument is used). 2-5

2

Organizing Data

Compare elements. Verify that the first object (with size medium) is smaller than the second object (with size large). sizeOrd(1) < sizeOrd(2) ans = logical 1

The logical value 1 indicates that the inequality holds.

See Also getlabels | getlevels | nominal | ordinal

Related Examples •

“Change Category Labels” on page 2-7



“Reorder Category Levels” on page 2-9



“Merge Category Levels” on page 2-16



“Index and Search Using Nominal and Ordinal Arrays” on page 2-41

More About

2-6



“Nominal and Ordinal Arrays” on page 2-36



“Advantages of Using Nominal and Ordinal Arrays” on page 2-38

Change Category Labels

Change Category Labels Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Change Category Labels This example shows how to change the labels for category levels in categorical arrays using setlabels. You also have the option to specify labels when creating a categorical array. Load sample data. The variable Cylinders contains the number of cylinders in 100 sample cars. load carsmall unique(Cylinders) ans = 3×1 4 6 8

The sample has 4-, 6-, and 8-cylinder cars. Create an ordinal array. Convert Cylinders to a nominal array with the default category labels (taken from the values in the data). cyl = ordinal(Cylinders); getlabels(cyl) ans = 1x3 cell {'4'} {'6'}

{'8'}

ordinal created labels using the integer values in Cylinders, but you should provide labels for numeric data. Change category labels. Relabel the categories in cyl to Four, Six, and Eight. cyl = setlabels(cyl ,{'Four','Six','Eight'}); getlabels(cyl) ans = 1x3 cell {'Four'}

{'Six'}

{'Eight'}

2-7

2

Organizing Data

Alternatively, you can specify category labels when you create a nominal or ordinal array using the second input argument, for example by specifying ordinal(Cylinders, {'Four','Six','Eight'}).

See Also getlabels | nominal | ordinal | setlabels

Related Examples •

“Reorder Category Levels” on page 2-9



“Add and Drop Category Levels” on page 2-18



“Index and Search Using Nominal and Ordinal Arrays” on page 2-41

More About

2-8



“Nominal and Ordinal Arrays” on page 2-36



“Advantages of Using Nominal and Ordinal Arrays” on page 2-38

Reorder Category Levels

Reorder Category Levels Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead. In this section... “Reorder Category Levels in Ordinal Arrays” on page 2-9 “Reorder Category Levels in Nominal Arrays” on page 2-10

Reorder Category Levels in Ordinal Arrays This example shows how to reorder the category levels in an ordinal array using reorderlevels. Load sample data. AllSizes = {'medium','large','small','small','medium',... 'large','medium','small'};

The created variable, AllSizes, is a cell array of character vectors containing size measurements on eight objects. Create an ordinal array. Convert AllSizes to an ordinal array without specifying the order of the category levels. size = ordinal(AllSizes); getlevels(size) ans = 1x3 ordinal large medium

small

By default, the categories are ordered by their labels in ascending alphabetical order, large < medium < small. Compare elements. Check whether or not the first object (which has size medium) is smaller than the second object (which has size large). size(1) < size(2) ans = logical 0

The logical value 0 indicates that the medium object is not smaller than the large object. Reorder category levels. Reorder the category levels so that small < medium < large. size = reorderlevels(size,{'small','medium','large'}); getlevels(size)

2-9

2

Organizing Data

ans = 1x3 ordinal small medium

large

Compare elements. Verify that the first object is now smaller than the second object. size(1) < size(2) ans = logical 1

The logical value 1 indicates that the expected inequality now holds.

Reorder Category Levels in Nominal Arrays This example shows how to reorder the category levels in nominal arrays using reorderlevels. By definition, nominal array categories have no natural ordering. However, you might want to change the order of levels for display or analysis purposes. For example, when fitting a regression model with categorical covariates, fitlm uses the first level of a nominal independent variable as the reference category. Load sample data. The dataset array, hospital, contains variables measured on 100 sample patients. The variable Weight contains the weight of each patient. The variable Sex is a nominal variable containing the gender, Male or Female, for each patient. load hospital getlevels(hospital.Sex) ans = 1x2 nominal Female Male

By default, the order of the nominal categories is in ascending alphabetical order of the labels. Plot data grouped by category level. Draw box plots of weight, grouped by gender. figure boxplot(hospital.Weight,hospital.Sex) title('Weight by Gender')

2-10

Reorder Category Levels

The box plots appear in the same alphabetical order returned by getlevels. Change the category order. Change the order of the category levels. hospital.Sex = reorderlevels(hospital.Sex,{'Male','Female'}); getlevels(hospital.Sex) ans = 1x2 nominal Male Female

The levels are in the newly specified order. Plot data in new order. Draw box plots of weight by gender. figure boxplot(hospital.Weight,hospital.Sex) title('Weight by Gender')

2-11

2

Organizing Data

The order of the box plots corresponds to the new level order.

See Also fitlm | getlevels | nominal | ordinal | reorderlevels

Related Examples •

“Change Category Labels” on page 2-7



“Merge Category Levels” on page 2-16



“Add and Drop Category Levels” on page 2-18



“Index and Search Using Nominal and Ordinal Arrays” on page 2-41

More About

2-12



“Nominal and Ordinal Arrays” on page 2-36



“Advantages of Using Nominal and Ordinal Arrays” on page 2-38

Categorize Numeric Data

Categorize Numeric Data Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Categorize Numeric Data This example shows how to categorize numeric data into a categorical ordinal array using ordinal. This is useful for discretizing continuous data. Load sample data. The dataset array, hospital, contains variables measured on a sample of patients. Compute the minimum, median, and maximum of the variable Age. load hospital quantile(hospital.Age,[0,.5,1]) ans = 1×3 25

39

50

The patient ages range from 25 to 50. Convert a numeric array to an ordinal array. Group patients into the age categories Under 30, 30-39, Over 40. hospital.AgeCat = ordinal(hospital.Age,{'Under 30','30-39','Over 40'},... [],[25,30,40,50]); getlevels(hospital.AgeCat) ans = 1x3 ordinal Under 30

30-39

Over 40

The last input argument to ordinal has the endpoints for the categories. The first category begins at age 25, the second at age 30, and so on. The last category contains ages 40 and above, so begins at 40 and ends at 50 (the maximum age in the data set). To specify three categories, you must specify four endpoints (the last endpoint is the upper bound of the last category). Explore categories. Display the age and age category for the second patient. dataset({hospital.Age(2),'Age'},... {hospital.AgeCat(2),'AgeCategory'}) ans = Age 43

AgeCategory Over 40

2-13

2

Organizing Data

When you discretize a numeric array into categories, the categorical array loses all information about the actual numeric values. In this example, AgeCat is not numeric, and you cannot recover the raw data values from it. Categorize a numeric array into quartiles. The variable Weight has weight measurements for the sample patients. Categorize the patient weights into four categories, by quartile. p = 0:.25:1; breaks = quantile(hospital.Weight,p); hospital.WeightQ = ordinal(hospital.Weight,{'Q1','Q2','Q3','Q4'},... [],breaks); getlevels(hospital.WeightQ) ans = 1x4 ordinal Q1 Q2

Q3

Q4

Explore categories. Display the weight and weight quartile for the second patient. dataset({hospital.Weight(2),'Weight'},... {hospital.WeightQ(2),'WeightQuartile'}) ans = Weight 163

WeightQuartile Q3

Summary statistics grouped by category levels. Compute the mean systolic and diastolic blood pressure for each age and weight category. grpstats(hospital,{'AgeCat','WeightQ'},'mean','DataVars','BloodPressure') ans = Under 30_Q1 Under 30_Q2 Under 30_Q3 Under 30_Q4 30-39_Q1 30-39_Q2 30-39_Q3 30-39_Q4 Over 40_Q1 Over 40_Q2 Over 40_Q3 Over 40_Q4

AgeCat Under 30 Under 30 Under 30 Under 30 30-39 30-39 30-39 30-39 Over 40 Over 40 Over 40 Over 40

WeightQ Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

GroupCount 6 3 2 4 12 9 9 11 7 13 14 10

mean_BloodPressure 123.17 79.667 120.33 79.667 127.5 86.5 122 78 121.75 81.75 119.56 82.556 121 83.222 125.55 87.273 122.14 84.714 123.38 79.385 123.07 84.643 124.6 85.1

The variable BloodPressure is a matrix with two columns. The first column is systolic blood pressure, and the second column is diastolic blood pressure. The group in the sample with the

2-14

Categorize Numeric Data

highest mean diastolic blood pressure, 87.273, is aged 30–39 and in the highest weight quartile, 30-39_Q4.

See Also grpstats | ordinal

Related Examples •

“Create Nominal and Ordinal Arrays” on page 2-3



“Merge Category Levels” on page 2-16



“Plot Data Grouped by Category” on page 2-21



“Index and Search Using Nominal and Ordinal Arrays” on page 2-41

More About •

“Nominal and Ordinal Arrays” on page 2-36



“Advantages of Using Nominal and Ordinal Arrays” on page 2-38

2-15

2

Organizing Data

Merge Category Levels Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Merge Category Levels This example shows how to merge categories in a nominal or ordinal array using mergelevels. This is useful for collapsing categories with few observations. Load sample data. load carsmall

Create a nominal array. The variable Origin is a character array containing the country of origin for 100 sample cars. Convert Origin to a nominal array. Origin = nominal(Origin); getlevels(Origin) ans = 1x6 nominal France Germany

Italy

Japan

Sweden

USA

There are six unique countries of origin in the data. Tabulate category counts. Explore the elements of the nominal array. tabulate(Origin) Value France Germany Italy Japan Sweden USA

Count 4 9 1 15 2 69

Percent 4.00% 9.00% 1.00% 15.00% 2.00% 69.00%

There are relatively few observations in each European country. Merge categories. Merge the categories France, Germany, Italy, and Sweden into one category called Europe. Origin = mergelevels(Origin,{'France','Germany','Italy','Sweden'},... 'Europe'); getlevels(Origin) ans = 1x3 nominal Europe Japan

2-16

USA

Merge Category Levels

The variable Origin now has only three category levels. Tabulate category counts. Explore the elements of the merged categories. tabulate(Origin) Value Europe Japan USA

Count 16 15 69

Percent 16.00% 15.00% 69.00%

The category Europe has the 16% of observations that were previously distributed across four countries.

See Also mergelevels | nominal

Related Examples •

“Create Nominal and Ordinal Arrays” on page 2-3



“Add and Drop Category Levels” on page 2-18



“Index and Search Using Nominal and Ordinal Arrays” on page 2-41

More About •

“Nominal and Ordinal Arrays” on page 2-36



“Advantages of Using Nominal and Ordinal Arrays” on page 2-38

2-17

2

Organizing Data

Add and Drop Category Levels This example shows how to add and drop levels from a nominal or ordinal array. Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead. Load sample data. load('examgrades')

The array grades contains exam scores from 0 to 100 on five exams for a sample of 120 students. Create an ordinal array. Assign letter grades to each student for each test using these categories. Grade Range

Letter Grade

100

A+

90–99

A

80–89

B

70–79

C

60–69

D

letter = ordinal(grades,{'D','C','B','A','A+'},[],... [60,70,80,90,100,100]); getlevels(letter) ans = D

C

B

A

A+

There are five grade categories, in the specified order D < C < B < A < A+. Check for undefined categories. Check whether or not there are any exam scores that do not fall into the five letter categories. any(isundefined(letter)) ans = 1

0

1

1

0

Recall that there are five exam scores for each student. The previous command returns a logical value for each of the five exams, indicating whether there are any scores that are . There are scores for the first, third, and fourth exams that are , that is, missing a category level. Identify elements in undefined categories. You can find the exam scores that do not have a letter grade using the isundefined logical condition. 2-18

Add and Drop Category Levels

grades(isundefined(letter)) ans = 55 59 58 59 54 57 56 59 59 50 59 52

The exam scores that are in the 50s do not have a letter grade. Add a new category. Put all scores that are into a new category labeled D-. letter(isundefined(letter)) = 'D-'; getlevels(letter) Warning: Categorical level 'D-' being added. > In categorical.subsasgn at 55 ans = D

C

B

A

A+

D-

The ordinal variable, letter, has a new category added to the end. Reorder category levels. Reorder the categories so that D- < D. letter = reorderlevels(letter,{'D-','D','C','B','A','A+'}); getlevels(letter) ans = D-

D

C

B

A

A+

Compare elements. Now that all exam scores have a letter grade, count how many students received a higher letter grade on the second test than on the first test. sum(letter(:,2) > letter(:,1)) ans = 32

Thirty-two students improved their letter grade between the first two exams. 2-19

2

Organizing Data

Explore categories. Count the number of A+ scores in each of the five exams. sum(letter=='A+') ans = 0

0

0

0

0

There are no A+ scores on any of the five exams. Drop a category. Drop the category A+ from the ordinal variable, letter. letter = droplevels(letter,'A+'); getlevels(letter) ans = D-

D

C

B

A

Category A+ is no longer in the ordinal variable, letter.

See Also droplevels | ordinal | reorderlevels

Related Examples •

“Create Nominal and Ordinal Arrays” on page 2-3



“Reorder Category Levels” on page 2-9



“Merge Category Levels” on page 2-16



“Index and Search Using Nominal and Ordinal Arrays” on page 2-41

More About

2-20



“Nominal and Ordinal Arrays” on page 2-36



“Advantages of Using Nominal and Ordinal Arrays” on page 2-38

Plot Data Grouped by Category

Plot Data Grouped by Category Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Plot Data Grouped by Category This example shows how to plot data grouped by the levels of a categorical variable. Load sample data. load carsmall

The variable Acceleration contains acceleration measurements on 100 sample cars. The variable Origin is a character array containing the country of origin for each car. Create a nominal array. Convert Origin to a nominal array. Origin = nominal(Origin); getlevels(Origin) ans = 1x6 nominal France Germany

Italy

Japan

Sweden

USA

There are six unique countries of origin in the sample. By default, nominal orders the countries in ascending alphabetical order. Plot data grouped by category. Draw box plots for Acceleration, grouped by Origin. figure boxplot(Acceleration,Origin) title('Acceleration, Grouped by Country of Origin')

2-21

2

Organizing Data

The box plots appear in the same order as the categorical levels (use reorderlevels to change the order of the categories). Few observations have Italy as the country of origin. Tabulate category counts. Tabulate the number of sample cars from each country. tabulate(Origin) Value France Germany Italy Japan Sweden USA

Count 4 9 1 15 2 69

Percent 4.00% 9.00% 1.00% 15.00% 2.00% 69.00%

Only one car is made in Italy. Drop a category. Delete the Italian car from the sample. Acceleration2 = Acceleration(Origin~='Italy'); Origin2 = Origin(Origin~='Italy'); getlevels(Origin2)

2-22

Plot Data Grouped by Category

ans = 1x6 nominal France Germany

Italy

Japan

Sweden

USA

Even though the car from Italy is no longer in the sample, the nominal variable, Origin2, still has the category Italy. Note that this is intentional—the levels of a categorical array do not necessarily coincide with the values. Drop a category level. Use droplevels to remove the Italy category. Origin2 = droplevels(Origin2,'Italy'); tabulate(Origin2) Value France Germany Japan Sweden USA

Count 4 9 15 2 69

Percent 4.04% 9.09% 15.15% 2.02% 69.70%

The Italy category is no longer in the nominal array, Origin2. Plot data grouped by category. Draw box plots of Acceleration2, grouped by Origin2. figure boxplot(Acceleration2,Origin2) title('Acceleration, Grouped by Country of Origin')

2-23

2

Organizing Data

The plot no longer includes the car from Italy.

See Also boxplot | droplevels | nominal | reorderlevels

Related Examples •

“Test Differences Between Category Means” on page 2-25



“Summary Statistics Grouped by Category” on page 2-32



“Linear Regression with Categorical Covariates” on page 2-52

More About

2-24



“Nominal and Ordinal Arrays” on page 2-36



“Advantages of Using Nominal and Ordinal Arrays” on page 2-38



“Grouping Variables” on page 2-45

Test Differences Between Category Means

Test Differences Between Category Means This example shows how to test for significant differences between category (group) means using a ttest, two-way ANOVA (analysis of variance), and ANOCOVA (analysis of covariance) analysis. The goal is determining if the expected miles per gallon for a car depends on the decade in which it was manufactured, or the location where it was manufactured. Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead. Load sample data. load('carsmall') unique(Model_Year) ans = 70 76 82

The variable MPG has miles per gallon measurements on a sample of 100 cars. The variables Model_Year and Origin contain the model year and country of origin for each car. The first factor of interest is the decade of manufacture. There are three manufacturing years in the data. Create a factor for the decade of manufacture. Create an ordinal array named Decade by merging the observations from years 70 and 76 into a category labeled 1970s, and putting the observations from 82 into a category labeled 1980s. Decade = ordinal(Model_Year,{'1970s','1980s'},[],[70 77 82]); getlevels(Decade) ans = 1970s

1980s

Plot data grouped by category. Draw a box plot of miles per gallon, grouped by the decade of manufacture. figure() boxplot(MPG,Decade) title('Miles per Gallon, Grouped by Decade of Manufacture')

2-25

2

Organizing Data

The box plot suggests that miles per gallon is higher in cars manufactured during the 1980s compared to the 1970s. Compute summary statistics. Compute the mean and variance of miles per gallon for each decade. [xbar,s2,grp] = grpstats(MPG,Decade,{'mean','var','gname'}) xbar = 19.7857 31.7097 s2 = 35.1429 29.0796 grp = '1970s' '1980s'

This output shows that the mean miles per gallon in the 1980s was 31.71, compared to 19.79 in the 1970s. The variances in the two groups are similar. Conduct a two-sample t-test for equal group means. Conduct a two-sample t-test, assuming equal variances, to test for a significant difference between the group means. The hypothesis is 2-26

Test Differences Between Category Means

H0 : μ70 = μ80 H A : μ70 ≠ μ80 . MPG70 = MPG(Decade=='1970s'); MPG80 = MPG(Decade=='1980s'); [h,p] = ttest2(MPG70,MPG80) h = 1 p = 3.4809e-15

The logical value 1 indicates the null hypothesis is rejected at the default 0.05 significance level. The p-value for the test is very small. There is sufficient evidence that the mean miles per gallon in the 1980s differs from the mean miles per gallon in the 1970s. Create a factor for the location of manufacture. The second factor of interest is the location of manufacture. First, convert Origin to a nominal array. Location = nominal(Origin); tabulate(Location) tabulate(Location) Value Count France 4 Germany 9 Italy 1 Japan 15 Sweden 2 USA 69

Percent 4.00% 9.00% 1.00% 15.00% 2.00% 69.00%

There are six different countries of manufacture. The European countries have relatively few observations. Merge categories. Combine the categories France, Germany, Italy, and Sweden into a new category named Europe. Location = mergelevels(Location, ... {'France','Germany','Italy','Sweden'},'Europe'); tabulate(Location) Value Japan USA Europe

Count 15 69 16

Percent 15.00% 69.00% 16.00%

Compute summary statistics. Compute the mean miles per gallon, grouped by the location of manufacture. [xbar,grp] = grpstats(MPG,Location,{'mean','gname'})

2-27

2

Organizing Data

xbar = 31.8000 21.1328 26.6667 grp = 'Japan' 'USA' 'Europe'

This result shows that average miles per gallon is lowest for the sample of cars manufactured in the U.S. Conduct two-way ANOVA. Conduct a two-way ANOVA to test for differences in expected miles per gallon between factor levels for Decade and Location. The statistical model is MPGi j = μ + αi + β j + εi j, i = 1, 2; j = 1, 2, 3, where MPGij is the response, miles per gallon, for cars made in decade i at location j. The treatment effects for the first factor, decade of manufacture, are the αi terms (constrained to sum to zero). The treatment effects for the second factor, location of manufacture, are the βj terms (constrained to sum to zero). The εij are uncorrelated, normally distributed noise terms. The hypotheses to test are equality of decade effects, H0 : α1 = α2 = 0 H A : at least one αi ≠ 0, and equality of location effects, H0 : β1 = β2 = β3 = 0 H A : at least one β j ≠ 0. You can conduct a multiple-factor ANOVA using anovan. anovan(MPG,{Decade,Location},'varnames',{'Decade','Location'});

2-28

Test Differences Between Category Means

This output shows the results of the two-way ANOVA. The p-value for testing the equality of decade effects is 2.88503e-18, so the null hypothesis is rejected at the 0.05 significance level. The p-value for testing the equality of location effects is 7.40416e-10, so this null hypothesis is also rejected. Conduct ANOCOVA analysis. A potential confounder in this analysis is car weight. Cars with greater weight are expected to have lower gas mileage. Include the variable Weight as a continuous covariate in the ANOVA; that is, conduct an ANOCOVA analysis. Assuming parallel lines, the statistical model is MPGi jk = μ + αi + β j + γWeighti jk + εi jk, i = 1, 2; j = 1, 2, 3; k = 1, ..., 100. The difference between this model and the two-way ANOVA model is the inclusion of the continuous predictor, Weightijk, the weight for the kth car, which was made in the ith decade and in the jth location. The slope parameter is γ. Add the continuous covariate as a third group in the second anovan input argument. Use the namevalue pair argument Continuous to specify that Weight (the third group) is continuous. anovan(MPG,{Decade,Location,Weight},'Continuous',3,... 'varnames',{'Decade','Location','Weight'});

This output shows that when car weight is considered, there is insufficient evidence of a manufacturing location effect (p-value = 0.1044). 2-29

2

Organizing Data

Use interactive tool. You can use the interactive aoctool to explore this result. aoctool(Weight,MPG,Location);

This command opens three dialog boxes. In the ANOCOVA Prediction Plot dialog box, select the Separate Means model.

This output shows that when you do not include Weight in the model, there are fairly large differences in the expected miles per gallon among the three manufacturing locations. Note that here the model does not adjust for the decade of manufacturing. Now, select the Parallel Lines model.

2-30

Test Differences Between Category Means

When you include Weight in the model, the difference in expected miles per gallon among the three manufacturing locations is much smaller.

See Also anovan | aoctool | boxplot | grpstats | nominal | ordinal | ttest2

Related Examples •

“Plot Data Grouped by Category” on page 2-21



“Summary Statistics Grouped by Category” on page 2-32



“Linear Regression with Categorical Covariates” on page 2-52

More About •

“Nominal and Ordinal Arrays” on page 2-36



“Advantages of Using Nominal and Ordinal Arrays” on page 2-38



“Grouping Variables” on page 2-45

2-31

2

Organizing Data

Summary Statistics Grouped by Category Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Summary Statistics Grouped by Category This example shows how to compute summary statistics grouped by levels of a categorical variable. You can compute group summary statistics for a numeric array or a dataset array using grpstats. Load sample data. load hospital

The dataset array, hospital, has 7 variables (columns) and 100 observations (rows). Compute summary statistics by category. The variable Sex is a nominal array with two levels, Male and Female. Compute the minimum and maximum weights for each gender. stats = grpstats(hospital,'Sex',{'min','max'},'DataVars','Weight') stats = Female Male

Sex Female Male

GroupCount 53 47

min_Weight 111 158

max_Weight 147 202

The dataset array, stats, has observations corresponding to the levels of the variable Sex. The variable min_Weight contains the minimum weight for each group, and the variable max_Weight contains the maximum weight for each group. Compute summary statistics by multiple categories. The variable Smoker is a logical array with value 1 for smokers and value 0 for nonsmokers. Compute the minimum and maximum weights for each gender and smoking combination. stats = grpstats(hospital,{'Sex','Smoker'},{'min','max'},... 'DataVars','Weight') stats = Female_0 Female_1 Male_0 Male_1

Sex Female Female Male Male

Smoker false true false true

GroupCount 40 13 26 21

min_Weight 111 115 158 164

max_Weight 147 146 194 202

The dataset array, stats, has an observation row for each combination of levels of Sex and Smoker in the original data.

See Also dataset | grpstats | nominal 2-32

Summary Statistics Grouped by Category

Related Examples •

“Plot Data Grouped by Category” on page 2-21



“Test Differences Between Category Means” on page 2-25



“Calculations on Dataset Arrays” on page 2-92

More About •

“Grouping Variables” on page 2-45



“Nominal and Ordinal Arrays” on page 2-36



“Dataset Arrays” on page 2-112

2-33

2

Organizing Data

Sort Ordinal Arrays Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Sort Ordinal Arrays This example shows how to determine sorting order for ordinal arrays. Load sample data. AllSizes = {'medium','large','small','small','medium',... 'large','medium','small'};

The created variable, AllSizes, is a cell array of character vectors containing size measurements on eight objects. Create an ordinal array. Convert AllSizes to an ordinal array with levels small < medium < large. AllSizes = ordinal(AllSizes,{},{'small','medium','large'}); getlevels(AllSizes) ans = 1x3 ordinal small medium

large

Sort the ordinal array. When you sort ordinal arrays, the sorted observations are in the same order as the category levels. sizeSort = sort(AllSizes); sizeSort(:) ans = 8x1 ordinal small small small medium medium medium large large

The sorted ordinal array, sizeSort, contains the observations ordered from small to large.

See Also ordinal

Related Examples • 2-34

“Reorder Category Levels” on page 2-9

Sort Ordinal Arrays



“Add and Drop Category Levels” on page 2-18

More About •

“Nominal and Ordinal Arrays” on page 2-36



“Advantages of Using Nominal and Ordinal Arrays” on page 2-38

2-35

2

Organizing Data

Nominal and Ordinal Arrays Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead. In this section... “What Are Nominal and Ordinal Arrays?” on page 2-36 “Nominal and Ordinal Array Conversion” on page 2-36

What Are Nominal and Ordinal Arrays? Nominal and ordinal arrays are Statistics and Machine Learning Toolbox data types for storing categorical values. Nominal and ordinal arrays store data that have a finite set of discrete levels, which might or might not have a natural order. • ordinal arrays store categorical values with ordered levels. For example, an ordinal variable might have levels {small, medium, large}. • nominal arrays store categorical values with unordered levels. For example, a nominal variable might have levels {red, blue, green}. In experimental design, these variables are often called factors, with ordered or unordered factor levels. Nominal and ordinal arrays are convenient and memory efficient containers for storing categorical variables. In addition to storing information about which category each observation belongs to, nominal and ordinal arrays store descriptive metadata including category labels and order. Nominal and ordinal arrays have associated methods that streamline common tasks such as merging categories, adding or dropping levels, and changing level labels.

Nominal and Ordinal Array Conversion You can easily convert to and from nominal or ordinal arrays. To create a nominal or ordinal array, use nominal or ordinal, respectively. You can convert these data types to nominal or ordinal arrays: • Numeric arrays • Logical arrays • Character arrays • String arrays • Cell arrays of character vectors

See Also nominal | ordinal

2-36

Nominal and Ordinal Arrays

Related Examples •

“Create Nominal and Ordinal Arrays” on page 2-3



“Summary Statistics Grouped by Category” on page 2-32



“Plot Data Grouped by Category” on page 2-21



“Index and Search Using Nominal and Ordinal Arrays” on page 2-41

More About •

“Advantages of Using Nominal and Ordinal Arrays” on page 2-38



“Grouping Variables” on page 2-45

2-37

2

Organizing Data

Advantages of Using Nominal and Ordinal Arrays Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead. In this section... “Manipulate Category Levels” on page 2-38 “Analysis Using Nominal and Ordinal Arrays” on page 2-38 “Reduce Memory Requirements” on page 2-39

Manipulate Category Levels When working with categorical variables and their levels, you’ll encounter some typical challenges. This table summarizes the functions you can use with nominal or ordinal arrays to manipulate category levels. For additional functions, type methods nominal or methods ordinal at the command line, or see the nominal and ordinal reference pages. Task

Function

Add new category levels

addlevels

Drop category levels

droplevels

Combine category levels

mergelevels

Reorder category levels

reorderlevels

Count the number of observations in each category

levelcounts

Change the label or name of category levels

setlabels

Create an interaction factor

times

Find observations that are not in a defined category

isundefined

Analysis Using Nominal and Ordinal Arrays You can use nominal and ordinal arrays in a variety of statistical analyses. For example, you might want to compute descriptive statistics for data grouped by the category levels, conduct statistical tests on differences between category means, or perform regression analysis using categorical predictors. Statistics and Machine Learning Toolbox functions that accept a grouping variable as an input argument accept nominal and ordinal arrays. This includes descriptive functions such as: • grpstats • gscatter • boxplot • gplotmatrix You can also use nominal and ordinal arrays as input arguments to analysis functions and methods based on models, such as: 2-38

Advantages of Using Nominal and Ordinal Arrays

• anovan • fitlm • fitglm When you use a nominal or ordinal array as a predictor in these functions, the fitting function automatically recognizes the categorical predictor, and constructs appropriate dummy indicator variables for analysis. Alternatively, you can construct your own dummy indicator variables using dummyvar.

Reduce Memory Requirements The levels of categorical variables are often defined as text, which can be costly to store and manipulate in a cell array of character vectors or char array. Nominal and ordinal arrays separately store category membership and category labels, greatly reducing the amount of memory required to store the variable. For example, load some sample data: load('fisheriris')

The variable species is a cell array of character vectors requiring 19,300 bytes of memory.

Convert species to a nominal array: species = nominal(species);

There is a 95% reduction in memory required to store the variable.

See Also nominal | ordinal

Related Examples •

“Create Nominal and Ordinal Arrays” on page 2-3



“Test Differences Between Category Means” on page 2-25



“Linear Regression with Categorical Covariates” on page 2-52



“Index and Search Using Nominal and Ordinal Arrays” on page 2-41 2-39

2

Organizing Data

More About

2-40



“Nominal and Ordinal Arrays” on page 2-36



“Grouping Variables” on page 2-45



“Dummy Variables” on page 2-48

Index and Search Using Nominal and Ordinal Arrays

Index and Search Using Nominal and Ordinal Arrays Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Index By Category It is often useful to index and search data by its category, or group. If you store categories as labels inside a cell array of character vectors or char array, it can be difficult to index and search the categories. When using nominal or ordinal arrays, you can easily: • Index elements from particular categories. For both nominal and ordinal arrays, you can use the logical operators == and ~= to index the observations that are in, or not in, a particular category. For ordinal arrays, which have an encoded order, you can also use inequalities, >, >=, and enter a scalar value in the field to the right. Depending on which operator you select, the app excludes from the fit any data values that are greater than or equal to the scalar value, or greater than the scalar value, respectively. OR Click the Exclude Graphically button to define the exclusion rule by displaying a plot of the values in a data set and selecting the bounds for the excluded data. For example, if you created the data set My data as described in Create and Manage Data Sets, select it from the drop-down list next to Exclude graphically, and then click the Exclude graphically button. The app displays the values in My data in a new window.

5-63

5

Probability Distributions

To set a lower limit for the boundary of the excluded region, click Add Lower Limit. The app displays a vertical line on the left side of the plot window. Move the line to the point you where you want the lower limit, as shown in the following figure.

5-64

Model Data Using the Distribution Fitter App

Move the vertical line to change the value displayed in the Lower limit: exclude data field in the Exclude window.

The value displayed corresponds to the x-coordinate of the vertical line. Similarly, you can set the upper limit for the boundary of the excluded region by clicking Add Upper Limit, and then moving the vertical line that appears at the right side of the plot window. After setting the lower and upper limits, click Close and return to the Exclude window. 3

Create Exclusion Rule—Once you have set the lower and upper limits for the boundary of the excluded data, click Create Exclusion Rule to create the new rule. The name of the new rule appears in the Existing exclusion rules pane. Selecting an exclusion rule in the Existing exclusion rules pane enables the following buttons: • Copy — Creates a copy of the rule, which you can then modify. To save the modified rule under a different name, click Create Exclusion Rule. • View — Opens a new window in which you can see the data points excluded by the rule. The following figure shows a typical example. 5-65

5

Probability Distributions

The shaded areas in the plot graphically display which data points are excluded. The table to the right lists all data points. The shaded rows indicate excluded points. • Rename — Rename the rule. • Delete — Delete the rule. After you define an exclusion rule, you can use it when you fit a distribution to your data. The rule does not exclude points from the display of the data set.

Save and Load Sessions Save your work in the current session, and then load it in a subsequent session, so that you can continue working where you left off. Save a Session To save the current session, from the File menu in the main window, select Save Session. A dialog box opens and prompts you to enter a file name, for examplemy_session.dfit. Click Save to save the following items created in the current session: • Data sets • Fits 5-66

Model Data Using the Distribution Fitter App

• Exclusion rules • Plot settings • Bin width rules Load a Session To load a previously saved session, from the File menu in the main window, select Load Session. Enter the name of a previously saved session. Click Open to restore the information from the saved session to the current session.

Generate a File to Fit and Plot Distributions Use the Generate Code option in the File to create a file that: • Fits the distributions in the current session to any data vector in the MATLAB workspace. • Plots the data and the fits. After you end the current session, you can use the file to create plots in a standard MATLAB figure window, without reopening the Distribution Fitter app. As an example, if you created the fit described in “Create a New Fit” on page 5-54, do the following steps: 1

From the File menu, select Generate Code.

2

In the MATLAB Editor window, choose File > Save as. Save the file as normal_fit.m in a folder on the MATLAB path.

You can then apply the function normal_fit to any vector of data in the MATLAB workspace. For example, the following commands: new_data = normrnd(4.1, 12.5, 100, 1); newfit = normal_fit(new_data) legend('New Data', 'My fit')

generate newfit, a fitted normal distribution of the data. The commands also generate a plot of the data and the fit. newfit = normal distribution mu = 3.19148 sigma = 12.5631

5-67

5

Probability Distributions

Note By default, the file labels the data in the legend using the same name as the data set in the Distribution Fitter app. You can change the label using the legend command, as illustrated by the preceding example.

See Also Distribution Fitter

More About

5-68



“Fit a Distribution Using the Distribution Fitter App” on page 5-69



“Define Custom Distributions Using the Distribution Fitter App” on page 5-78

Fit a Distribution Using the Distribution Fitter App

Fit a Distribution Using the Distribution Fitter App In this section... “Step 1: Load Sample Data” on page 5-69 “Step 2: Import Data” on page 5-69 “Step 3: Create a New Fit” on page 5-71 “Step 4: Create and Manage Additional Fits” on page 5-74 This example shows how you can use the Distribution Fitter app to interactively fit a probability distribution to data.

Step 1: Load Sample Data Load the sample data. load carsmall

Step 2: Import Data Open the Distribution Fitter tool. distributionFitter

To import the vector MPG into the Distribution Fitter app, click the Data button. The Data dialog box opens.

5-69

5

Probability Distributions

The Data field displays all numeric arrays in the MATLAB workspace. From the drop-down list, select MPG. A histogram of the selected data appears in the Data preview pane. In the Data set name field, type a name for the data set, such as MPG data, and click Create Data Set. The main window of the Distribution Fitter app now displays a larger version of the histogram in the Data preview pane.

5-70

Fit a Distribution Using the Distribution Fitter App

Step 3: Create a New Fit To fit a distribution to the data, in the main window of the Distribution Fitter app, click New Fit. To fit a normal distribution to MPG data: 1

In the Fit name field, enter a name for the fit, such as My fit.

2

From the drop-down list in the Data field, select MPG data.

3

Confirm that Normal is selected from the drop-down menu in the Distribution field.

4

Click Apply. 5-71

5

Probability Distributions

The Results pane displays the mean and standard deviation of the normal distribution that best fits MPG data. The Distribution Fitter app main window displays a plot of the normal distribution with this mean and standard deviation.

5-72

Fit a Distribution Using the Distribution Fitter App

Based on the plot, a normal distribution does not appear to provide a good fit for the MPG data. To obtain a better evaluation, select Probability plot from the Display type drop-down list. Confirm that the Distribution drop-down list is set to Normal. The main window displays the following figure.

5-73

5

Probability Distributions

The normal probability plot shows that the data deviates from normal, especially in the tails.

Step 4: Create and Manage Additional Fits The MPG data pdf indicates that the data has two peaks. Try fitting a nonparametric kernel distribution to obtain a better fit for this data.

5-74

1

Click Manage Fits. In the dialog box, click New Fit.

2

In the Fit name field, enter a name for the fit, such as Kernel fit.

3

From the drop-down list in the Data field, select MPG data.

Fit a Distribution Using the Distribution Fitter App

4

From the drop-down list in the Distribution field, select Non-parametric. This enables several options in the Non-parametric pane, including Kernel, Bandwidth, and Domain. For now, accept the default value to apply a normal kernel shape and automatically determine the kernel bandwidth (using Auto). For more information about nonparametric kernel distributions, see “Kernel Distribution” on page B-78.

5

Click Apply.

The Results pane displays the kernel type, bandwidth, and domain of the nonparametric distribution fit to MPG data.

The main window displays plots of the original MPG data with the normal distribution and nonparametric kernel distribution overlaid. To visually compare these two fits, select Density (PDF) from the Display type drop-down list.

5-75

5

Probability Distributions

To include only the nonparametric kernel fit line (Kernel fit) on the plot, click Manage Fits. In the Table of fits pane, locate the row for the normal distribution fit (My fit) and clear the box in the Plot column.

5-76

Fit a Distribution Using the Distribution Fitter App

See Also Distribution Fitter

More About •

“Model Data Using the Distribution Fitter App” on page 5-50



“Define Custom Distributions Using the Distribution Fitter App” on page 5-78

5-77

5

Probability Distributions

Define Custom Distributions Using the Distribution Fitter App You can define a probability object for a custom distribution and use the Distribution Fitter app or fitdist to fit distributions not supported by Statistics and Machine Learning Toolbox. You can also use a custom probability object as an input argument of probability object functions, such as pdf, cdf, icdf, and random, to evaluate the distribution, generate random numbers, and so on.

Open the Distribution Fitter App • MATLAB Toolstrip: On the Apps tab, under Math, Statistics and Optimization, click the app icon. • MATLAB command prompt: Enter distributionFitter. distributionFitter

5-78

Define Custom Distributions Using the Distribution Fitter App

Define Custom Distribution To define a custom distribution using the app, select File > Define Custom Distributions. A file template opens in the MATLAB Editor. You then edit this file so that it creates a probability object for the distribution you want. The template includes sample code that defines a probability object for the Laplace distribution. Follow the instructions in the template to define your own custom distribution. To save your custom probability object, create a directory named +prob on your path. Save the file in this directory using a name that matches your distribution name. For example, save the template as LaplaceDistribution.m, and then import the custom distribution as described in the next section. 5-79

5

Probability Distributions

Import Custom Distribution To import a custom distribution using the app, select File > Import Custom Distributions. The Imported Distributions dialog box opens, in which you can select the file that defines the distribution. For example, if you create the file LaplaceDistribution.m, as described in the preceding section, the list in the dialog box includes Laplace followed by an asterisk, indicating the file is new or modified and available for fitting.

Alternatively, you can use the makedist function to reset the list of distributions so that you do not need to select File > Import Custom Distributions in the app. makedist -reset

This command resets the list of distributions by searching the path for files contained in a package named prob and implementing classes derived from ProbabilityDistribution. If you open the app after resetting the list, the distribution list in the app includes the custom distribution that you defined. Once you import a custom distribution using the Distribution Fitter app or reset the list by using makedist, you can use the custom distribution in the app and in the Command Window. The Distribution field of the New Fit dialog box, available from the Distribution Fitter app, contains the new custom distribution. In the Command Window, you can create the custom probability distribution object by using makedist and fit a data set to the custom distribution by using fitdist. Then, you 5-80

Define Custom Distributions Using the Distribution Fitter App

can use probability object functions, such as pdf, cdf, icdf, and random, to evaluate the distribution, generate random numbers, and so on.

See Also Distribution Fitter | makedist

More About •

“Model Data Using the Distribution Fitter App” on page 5-50



“Fit a Distribution Using the Distribution Fitter App” on page 5-69

5-81

5

Probability Distributions

Explore the Random Number Generation UI The Random Number Generation user interface (UI) generates random samples from specified probability distributions, and displays the samples as histograms. Use the interface to explore the effects of changing parameters and sample size on the distributions. Run the user interface by typing randtool at the command line.

Start by selecting a distribution, then enter the desired sample size. You can also • Use the controls at the bottom of the window to set parameter values for the distribution and to change their upper and lower bounds. 5-82

Explore the Random Number Generation UI

• Draw another sample from the same distribution, with the same size and parameters. • Export the current sample to your workspace. A dialog box enables you to provide a name for the sample.

5-83

5

Probability Distributions

Compare Multiple Distribution Fits This example shows how to fit multiple probability distribution objects to the same set of sample data, and obtain a visual comparison of how well each distribution fits the data. Step 1. Load sample data. Load the sample data. load carsmall

This data contains miles per gallon (MPG) measurements for different makes and models of cars, grouped by country of origin (Origin), model year (Model_Year), and other vehicle characteristics. Step 2. Create a categorical array. Transform Origin into a categorical array and remove the Italian car from the sample data. Since there is only one Italian car, fitdist cannot fit a distribution to that group using other than a kernel distribution. Origin = categorical(cellstr(Origin)); MPG2 = MPG(Origin~='Italy'); Origin2 = Origin(Origin~='Italy'); Origin2 = removecats(Origin2,'Italy');

Step 3. Fit multiple distributions by group. Use fitdist to fit Weibull, normal, logistic, and kernel distributions to each country of origin group in the MPG data. [WeiByOrig,Country] = fitdist(MPG2,'weibull','by',Origin2); [NormByOrig,Country] = fitdist(MPG2,'normal','by',Origin2); [LogByOrig,Country] = fitdist(MPG2,'logistic','by',Origin2); [KerByOrig,Country] = fitdist(MPG2,'kernel','by',Origin2); WeiByOrig WeiByOrig=1×5 cell array Columns 1 through 2 {1x1 prob.WeibullDistribution}

{1x1 prob.WeibullDistribution}

Columns 3 through 4 {1x1 prob.WeibullDistribution} Column 5 {1x1 prob.WeibullDistribution} Country Country = 5x1 cell {'France' } {'Germany'} {'Japan' } {'Sweden' }

5-84

{1x1 prob.WeibullDistribution}

Compare Multiple Distribution Fits

{'USA'

}

Each country group now has four distribution objects associated with it. For example, the cell array WeiByOrig contains five Weibull distribution objects, one for each country represented in the sample data. Likewise, the cell array NormByOrig contains five normal distribution objects, and so on. Each object contains properties that hold information about the data, distribution, and parameters. The array Country lists the country of origin for each group in the same order as the distribution objects are stored in the cell arrays. Step 4. Compute the pdf for each distribution. Extract the four probability distribution objects for USA and compute the pdf for each distribution. As shown in Step 3, USA is in position 5 in each cell array. WeiUSA = WeiByOrig{5}; NormUSA = NormByOrig{5}; LogUSA = LogByOrig{5}; KerUSA = KerByOrig{5}; x = 0:1:50; pdf_Wei = pdf(WeiUSA,x); pdf_Norm = pdf(NormUSA,x); pdf_Log = pdf(LogUSA,x); pdf_Ker = pdf(KerUSA,x);

Step 5. Plot pdf the for each distribution. Plot the pdf for each distribution fit to the USA data, superimposed on a histogram of the sample data. Normalize the histogram for easier display. Create a histogram of the USA sample data. data = MPG(Origin2=='USA'); figure histogram(data,10,'Normalization','pdf','FaceColor',[1,0.8,0]);

Plot the pdf of each fitted distribution. line(x,pdf_Wei,'LineStyle','-','Color','r') line(x,pdf_Norm,'LineStyle','-.','Color','b') line(x,pdf_Log,'LineStyle','--','Color','g') line(x,pdf_Ker,'LineStyle',':','Color','k') legend('Data','Weibull','Normal','Logistic','Kernel','Location','Best') title('MPG for Cars from USA') xlabel('MPG')

5-85

5

Probability Distributions

Superimposing the pdf plots over a histogram of the sample data provides a visual comparison of how well each type of distribution fits the data. Only the nonparametric kernel distribution KerUSA comes close to revealing the two modes in the original data. Step 6. Further group USA data by year. To investigate the two modes revealed in Step 5, group the MPG data by both country of origin (Origin) and model year (Model_Year), and use fitdist to fit kernel distributions to each group. [KerByYearOrig,Names] = fitdist(MPG,'Kernel','By',{Origin Model_Year});

Each unique combination of origin and model year now has a kernel distribution object associated with it. Names Names = 14x1 cell {'France...' } {'France...' } {'Germany...'} {'Germany...'} {'Germany...'} {'Italy...' } {'Japan...' } {'Japan...' } {'Japan...' } {'Sweden...' }

5-86

Compare Multiple Distribution Fits

{'Sweden...' {'USA...' {'USA...' {'USA...'

} } } }

Plot the three probability distributions for each USA model year, which are in positions 12, 13, and 14 in the cell array KerByYearOrig. figure hold on for i = 12 : 14 plot(x,pdf(KerByYearOrig{i},x)) end legend('1970','1976','1982') title('MPG in USA Cars by Model Year') xlabel('MPG') hold off

When further grouped by model year, the pdf plots reveal two distinct peaks in the MPG data for cars made in the USA — one for the model year 1970 and one for the model year 1982. This explains why the histogram for the combined USA miles per gallon data shows two peaks instead of one.

See Also fitdist | histogram | pdf

5-87

5

Probability Distributions

More About

5-88



“Grouping Variables” on page 2-45



“Fit Probability Distribution Objects to Grouped Data” on page 5-89



“Working with Probability Distributions” on page 5-2



“Supported Distributions” on page 5-13

Fit Probability Distribution Objects to Grouped Data

Fit Probability Distribution Objects to Grouped Data This example shows how to fit probability distribution objects to grouped sample data, and create a plot to visually compare the pdf of each group. Step 1. Load sample data. Load the sample data. load carsmall;

The data contains miles per gallon (MPG) measurements for different makes and models of cars, grouped by country of origin (Origin), model year (Model_Year), and other vehicle characteristics. Step 2. Create a categorical array. Transform Origin into a categorical array. Origin = categorical(cellstr(Origin));

Step 3. Fit kernel distributions to each group. Use fitdist to fit kernel distributions to each country of origin group in the MPG data. [KerByOrig,Country] = fitdist(MPG,'Kernel','by',Origin) KerByOrig=1×6 cell array Columns 1 through 2 {1x1 prob.KernelDistribution}

{1x1 prob.KernelDistribution}

Columns 3 through 4 {1x1 prob.KernelDistribution}

{1x1 prob.KernelDistribution}

Columns 5 through 6 {1x1 prob.KernelDistribution}

{1x1 prob.KernelDistribution}

Country = 6x1 cell {'France' } {'Germany'} {'Italy' } {'Japan' } {'Sweden' } {'USA' }

The cell array KerByOrig contains six kernel distribution objects, one for each country represented in the sample data. Each object contains properties that hold information about the data, the distribution, and the parameters. The array Country lists the country of origin for each group in the same order as the distribution objects are stored in KerByOrig.

5-89

5

Probability Distributions

Step 4. Compute the pdf for each group. Extract the probability distribution objects for Germany, Japan, and USA. Use the positions of each country in KerByOrig shown in Step 3, which indicates that Germany is the second country, Japan is the fourth country, and USA is the sixth country. Compute the pdf for each group. Germany = KerByOrig{2}; Japan = KerByOrig{4}; USA = KerByOrig{6}; x = 0:1:50; USA_pdf = pdf(USA,x); Japan_pdf = pdf(Japan,x); Germany_pdf = pdf(Germany,x);

Step 5. Plot the pdf for each group. Plot the pdf for each group on the same figure. plot(x,USA_pdf,'r-') hold on plot(x,Japan_pdf,'b-.') plot(x,Germany_pdf,'k:') legend({'USA','Japan','Germany'},'Location','NW') title('MPG by Country of Origin') xlabel('MPG')

5-90

Fit Probability Distribution Objects to Grouped Data

The resulting plot shows how miles per gallon (MPG) performance differs by country of origin (Origin). Using this data, the USA has the widest distribution, and its peak is at the lowest MPG value of the three origins. Japan has the most regular distribution with a slightly heavier left tail, and its peak is at the highest MPG value of the three origins. The peak for Germany is between the USA and Japan, and the second bump near 44 miles per gallon suggests that there might be multiple modes in the data.

See Also fitdist | pdf

More About •

“Kernel Distribution” on page B-78



“Grouping Variables” on page 2-45



“Fit Distributions to Grouped Data Using ksdensity” on page 5-40



“Working with Probability Distributions” on page 5-2



“Supported Distributions” on page 5-13

5-91

5

Probability Distributions

Multinomial Probability Distribution Objects This example shows how to generate random numbers, compute and plot the pdf, and compute descriptive statistics of a multinomial distribution using probability distribution objects. Step 1. Define the distribution parameters. Create a vector p containing the probability of each outcome. Outcome 1 has a probability of 1/2, outcome 2 has a probability of 1/3, and outcome 3 has a probability of 1/6. The number of trials n in each experiment is 5, and the number of repetitions reps of the experiment is 8. p = [1/2 1/3 1/6]; n = 5; reps = 8;

Step 2. Create a multinomial probability distribution object. Create a multinomial probability distribution object using the specified value p for the Probabilities parameter. pd = makedist('Multinomial','Probabilities',p) pd = MultinomialDistribution Probabilities: 0.5000 0.3333

0.1667

Step 3. Generate one random number. Generate one random number from the multinomial distribution, which is the outcome of a single trial. rng('default') r = random(pd)

% For reproducibility

r = 2

This trial resulted in outcome 2. Step 4. Generate a matrix of random numbers. You can also generate a matrix of random numbers from the multinomial distribution, which reports the results of multiple experiments that each contain multiple trials. Generate a matrix that contains the outcomes of an experiment with n = 5 trials and reps = 8 repetitions. r = random(pd,reps,n) r = 8×5 3 1 3 2 1 1

5-92

3 1 3 3 1 2

3 2 3 2 1 3

2 2 1 2 1 2

1 1 2 2 1 3

Multinomial Probability Distribution Objects

2 3

1 1

3 2

1 1

1 1

Each element in the resulting matrix is the outcome of one trial. The columns correspond to the five trials in each experiment, and the rows correspond to the eight experiments. For example, in the first experiment (corresponding to the first row), one of the five trials resulted in outcome 1, one of the five trials resulted in outcome 2, and three of the five trials resulted in outcome 3. Step 5. Compute and plot the pdf. Compute the pdf of the distribution. x = 1:3; y = pdf(pd,x); bar(x,y) xlabel('Outcome') ylabel('Probability Mass') title('Trinomial Distribution')

The plot shows the probability mass for each k possible outcome. For this distribution, the pdf value for any x other than 1, 2, or 3 is 0. Step 6. Compute descriptive statistics. Compute the mean, median, and standard deviation of the distribution. m = mean(pd)

5-93

5

Probability Distributions

m = 1.6667 med = median(pd) med = 1 s = std(pd) s = 0.7454

See Also More About

5-94



“Multinomial Probability Distribution Functions” on page 5-95



“Working with Probability Distributions” on page 5-2



“Supported Distributions” on page 5-13

Multinomial Probability Distribution Functions

Multinomial Probability Distribution Functions This example shows how to generate random numbers and compute and plot the pdf of a multinomial distribution using probability distribution functions. Step 1. Define the distribution parameters. Create a vector p containing the probability of each outcome. Outcome 1 has a probability of 1/2, outcome 2 has a probability of 1/3, and outcome 3 has a probability of 1/6. The number of trials in each experiment n is 5, and the number of repetitions of the experiment reps is 8. p = [1/2 1/3 1/6]; n = 5; reps = 8;

Step 2. Generate one random number. Generate one random number from the multinomial distribution, which is the outcome of a single trial. rng('default') % For reproducibility r = mnrnd(1,p,1) r = 1×3 0

1

0

The returned vector r contains three elements, which show the counts for each possible outcome. This single trial resulted in outcome 2. Step 3. Generate a matrix of random numbers. You can also generate a matrix of random numbers from the multinomial distribution, which reports the results of multiple experiments that each contain multiple trials. Generate a matrix that contains the outcomes of an experiment with n = 5 trials and reps = 8 repetitions. r = mnrnd(n,p,reps) r = 8×3 1 3 1 0 5 1 3 3

1 2 1 4 0 2 1 1

3 0 3 1 0 2 1 1

Each row in the resulting matrix contains counts for each of the k multinomial bins. For example, in the first experiment (corresponding to the first row), one of the five trials resulted in outcome 1, one of the five trials resulted in outcome 2, and three of the five trials resulted in outcome 3.

5-95

5

Probability Distributions

Step 4. Compute the pdf. Since multinomial functions work with bin counts, create a multidimensional array of all possible outcome combinations, and compute the pdf using mnpdf. count1 = 1:n; count2 = 1:n; [x1,x2] = meshgrid(count1,count2); x3 = n-(x1+x2); y = mnpdf([x1(:),x2(:),x3(:)],repmat(p,(n)^2,1));

Step 5. Plot the pdf. Create a 3-D bar graph to visualize the pdf for each combination of outcome frequencies. y = reshape(y,n,n); bar3(y) set(gca,'XTickLabel',1:n); set(gca,'YTickLabel',1:n); xlabel('x_1 Frequency') ylabel('x_2 Frequency') zlabel('Probability Mass')

5-96

Multinomial Probability Distribution Functions

The plot shows the probability mass for each possible combination of outcomes. It does not show x3 , which is determined by the constraint x1 + x2 + x3 = n .

See Also More About •

“Multinomial Probability Distribution Objects” on page 5-92



“Working with Probability Distributions” on page 5-2



“Supported Distributions” on page 5-13

5-97

5

Probability Distributions

Generate Random Numbers Using Uniform Distribution Inversion This example shows how to generate random numbers using the uniform distribution inversion method. This is useful for distributions when it is possible to compute the inverse cumulative distribution function, but there is no support for sampling from the distribution directly. Step 1. Generate random numbers from the standard uniform distribution. Use rand to generate 1000 random numbers from the uniform distribution on the interval (0,1). rng('default') % For reproducibility u = rand(1000,1);

The inversion method relies on the principle that continuous cumulative distribution functions (cdfs) range uniformly over the open interval (0,1). If u is a uniform random number on (0,1), then x = F−1(u) generates a random number x from any continuous distribution with the specified cdf F. Step 2. Generate random numbers from the Weibull distribution. Use the inverse cumulative distribution function to generate the random numbers from a Weibull distribution with parameters A = 1 and B = 1 that correspond to the probabilities in u. Plot the results. x = wblinv(u,1,1); histogram(x,20);

5-98

Generate Random Numbers Using Uniform Distribution Inversion

The histogram shows that the random numbers generated using the Weibull inverse cdf function wblinv have a Weibull distribution. Step 3. Generate random numbers from the standard normal distribution. The same values in u can generate random numbers from any distribution, for example the standard normal, by following the same procedure using the inverse cdf of the desired distribution. figure x_norm = norminv(u,1,1); histogram(x_norm,20)

The histogram shows that, by using the standard normal inverse cdf norminv, the random numbers generated from u now have a standard normal distribution.

See Also hist | norminv | rand | wblinv

More About •

“Uniform Distribution (Continuous)” on page B-163



“Weibull Distribution” on page B-170



“Normal Distribution” on page B-119



“Random Number Generation” on page 5-26 5-99

5

Probability Distributions

5-100



“Generate Random Numbers Using the Triangular Distribution” on page 5-46



“Generating Pseudorandom Numbers” on page 7-2

Represent Cauchy Distribution Using t Location-Scale

Represent Cauchy Distribution Using t Location-Scale This example shows how to use the t location-scale probability distribution object to work with a Cauchy distribution with nonstandard parameter values. Step 1. Create a probability distribution object. Create a t location-scale probability distribution object with degrees of freedom nu = 1. Specify mu = 3 to set the location parameter equal to 3, and sigma = 1 to set the scale parameter equal to 1. pd = makedist('tLocationScale','mu',3,'sigma',1,'nu',1) pd = tLocationScaleDistribution t Location-Scale distribution mu = 3 sigma = 1 nu = 1

Step 2. Compute descriptive statistics. Use object functions to compute descriptive statistics for the Cauchy distribution. med = median(pd) med = 3 r = iqr(pd) r = 2 m = mean(pd) m = NaN s = std(pd) s = Inf

The median of the Cauchy distribution is equal to its location parameter, and the interquartile range is equal to two times its scale parameter. Its mean and standard deviation are undefined. Step 3. Compute and plot the pdf. Compute and plot the pdf of the Cauchy distribution. x = -20:1:20; y = pdf(pd,x); plot(x,y,'LineWidth',2)

5-101

5

Probability Distributions

The peak of the pdf is centered at the location parameter mu = 3. Step 4. Generate a vector of Cauchy random numbers. Generate a column vector containing 10 random numbers from the Cauchy distribution using the random function for the t location-scale probability distribution object. rng('default'); % For reproducibility r = random(pd,10,1) r = 10×1 3.2678 4.6547 2.0604 4.7322 3.1810 1.6649 1.8471 4.2466 5.4647 8.8874

Step 5. Generate a matrix of Cauchy random numbers. Generate a 5-by-5 matrix of Cauchy random numbers. 5-102

Represent Cauchy Distribution Using t Location-Scale

r = random(pd,5,5) r = 5×5 2.2867 2.7421 3.5966 5.4791 1.6863

2.9692 2.7180 3.9806 15.6472 4.0985

-1.7003 3.2210 1.0182 0.7558 2.9934

5.5949 2.4233 6.4180 2.8908 13.9506

1.9806 3.1394 5.1367 5.9031 4.8792

See Also makedist

More About •

“t Location-Scale Distribution” on page B-156



“Generate Cauchy Random Numbers Using Student's t” on page 5-104

5-103

5

Probability Distributions

Generate Cauchy Random Numbers Using Student's t This example shows how to use the Student's t distribution to generate random numbers from a standard Cauchy distribution. Step 1. Generate a vector of random numbers. Generate a column vector containing 10 random numbers from a standard Cauchy distribution, which has a location parameter mu = 0 and scale parameter sigma = 1. Use trnd with degrees of freedom V = 1. rng('default'); % For reproducibility r = trnd(1,10,1) r = 10×1 0.2678 1.6547 -0.9396 1.7322 0.1810 -1.3351 -1.1529 1.2466 2.4647 5.8874

Step 2. Generate a matrix of random numbers. Generate a 5-by-5 matrix of random numbers from a standard Cauchy distribution. r = trnd(1,5,5) r = 5×5 -0.7133 -0.2579 0.5966 2.4791 -1.3137

-0.0308 -0.2820 0.9806 12.6472 1.0985

-4.7003 0.2210 -1.9818 -2.2442 -0.0066

2.5949 -0.5767 3.4180 -0.1092 10.9506

-1.0194 0.1394 2.1367 2.9031 1.8792

See Also trnd

More About

5-104



“Student's t Distribution” on page B-149



“Represent Cauchy Distribution Using t Location-Scale” on page 5-101

Generate Correlated Data Using Rank Correlation

Generate Correlated Data Using Rank Correlation This example shows how to use a copula and rank correlation to generate correlated data from probability distributions that do not have an inverse cdf function available, such as the Pearson flexible distribution family. Step 1. Generate Pearson random numbers. Generate 1000 random numbers from two different Pearson distributions, using the pearsrnd function. The first distribution has the parameter values mu equal to 0, sigma equal to 1, skew equal to 1, and kurtosis equal to 4. The second distribution has the parameter values mu equal to 0, sigma equal to 1, skew equal to 0.75, and kurtosis equal to 3. rng default % For reproducibility p1 = pearsrnd(0,1,-1,4,1000,1); p2 = pearsrnd(0,1,0.75,3,1000,1);

At this stage, p1 and p2 are independent samples from their respective Pearson distributions, and are uncorrelated. Step 2. Plot the Pearson random numbers. Create a scatterhist plot to visualize the Pearson random numbers. figure scatterhist(p1,p2)

5-105

5

Probability Distributions

The histograms show the marginal distributions for p1 and p2. The scatterplot shows the joint distribution for p1 and p2. The lack of pattern to the scatterplot shows that p1 and p2 are independent. Step 3. Generate random numbers using a Gaussian copula. Use copularnd to generate 1000 correlated random numbers with a correlation coefficient equal to –0.8, using a Gaussian copula. Create a scatterhist plot to visualize the random numbers generated from the copula. u = copularnd('Gaussian',-0.8,1000); figure scatterhist(u(:,1),u(:,2))

The histograms show that the data in each column of the copula have a marginal uniform distribution. The scatterplot shows that the data in the two columns are negatively correlated. Step 4. Sort the copula random numbers. Using Spearman's rank correlation, transform the two independent Pearson samples into correlated data. Use the sort function to sort the copula random numbers from smallest to largest, and to return a vector of indices describing the rearranged order of the numbers. [s1,i1] = sort(u(:,1)); [s2,i2] = sort(u(:,2));

5-106

Generate Correlated Data Using Rank Correlation

s1 and s2 contain the numbers from the first and second columns of the copula, u, sorted in order from smallest to largest. i1 and i2 are index vectors that describe the rearranged order of the elements into s1 and s2. For example, if the first value in the sorted vector s1 is the third value in the original unsorted vector, then the first value in the index vector i1 is 3. Step 5. Transform the Pearson samples using Spearman's rank correlation. Create two vectors of zeros, x1 and x2, that are the same size as the sorted copula vectors, s1 and s2. Sort the values in p1 and p2 from smallest to largest. Place the values into x1 and x2, in the same order as the indices i1 and i2 generated by sorting the copula random numbers. x1 = zeros(size(s1)); x2 = zeros(size(s2)); x1(i1) = sort(p1); x2(i2) = sort(p2);

Step 6. Plot the correlated Pearson random numbers. Create a scatterhist plot to visualize the correlated Pearson data. figure scatterhist(x1,x2)

The histograms show the marginal Pearson distributions for each column of data. The scatterplot shows the joint distribution of p1 and p2, and indicates that the data are now negatively correlated. 5-107

5

Probability Distributions

Step 7. Confirm Spearman rank correlation coefficient values. Confirm that the Spearman rank correlation coefficient is the same for the copula random numbers and the correlated Pearson random numbers. copula_corr = corr(u,'Type','spearman') copula_corr = 2×2 1.0000 -0.7858

-0.7858 1.0000

pearson_corr = corr([x1,x2],'Type','spearman') pearson_corr = 2×2 1.0000 -0.7858

-0.7858 1.0000

The Spearman rank correlation is the same for the copula and the Pearson random numbers.

See Also copularnd | corr | sort

More About •

5-108

“Copulas: Generate Correlated Samples” on page 5-118

Create Gaussian Mixture Model

Create Gaussian Mixture Model This example shows how to create a known, or fully specified, Gaussian mixture model (GMM) object using gmdistribution and by specifying component means, covariances, and mixture proportions. To create a GMM object by fitting data to a GMM, see “Fit Gaussian Mixture Model to Data” on page 5-112. Specify the component means, covariances, and mixing proportions for a two-component mixture of bivariate Gaussian distributions. mu = [1 2;-3 -5]; % Means sigma = cat(3,[2 0;0 .5],[1 0;0 1]); % Covariances p = ones(1,2)/2; % Mixing proportions

The rows of mu correspond to the component mean vectors, and the pages of sigma, sigma(:,;,J), correspond to the component covariance matrices. Create a GMM object using gmdistribution. gm = gmdistribution(mu,sigma,p);

Display the properties of the GMM. properties(gm) Properties for class gmdistribution: NumVariables DistributionName NumComponents ComponentProportion SharedCovariance NumIterations RegularizationValue NegativeLogLikelihood CovarianceType mu Sigma AIC BIC Converged ProbabilityTolerance

For a description of the properties, see gmdistribution. To access the value of a property, use dot notation. For example, access the number of variables of each GMM component. dimension = gm.NumVariables dimension = 2

Visualize the probability density function (pdf) of the GMM using pdf and the MATLAB® function fsurf. gmPDF = @(x,y)reshape(pdf(gm,[x(:) y(:)]),size(x)); fsurf(gmPDF,[-10 10]) title('Probability Density Function of GMM');

5-109

5

Probability Distributions

Visualize the cumulative distribution function (cdf) of the GMM using cdf and fsurf. gmCDF = @(x,y)reshape(cdf(gm,[x(:) y(:)]),size(x)); fsurf(gmCDF,[-10 10]) title('Cumulative Distribution Function of GMM');

5-110

Create Gaussian Mixture Model

See Also fitgmdist | gmdistribution

More About •

“Fit Gaussian Mixture Model to Data” on page 5-112



“Simulate Data from Gaussian Mixture Model” on page 5-116



“Cluster Using Gaussian Mixture Model” on page 16-39

5-111

5

Probability Distributions

Fit Gaussian Mixture Model to Data This example shows how to simulate data from a multivariate normal distribution, and then fit a Gaussian mixture model (GMM) to the data using fitgmdist. To create a known, or fully specified, GMM object, see “Create Gaussian Mixture Model” on page 5-109. fitgmdist requires a matrix of data and the number of components in the GMM. To create a useful GMM, you must choose k carefully. Too few components fails to model the data accurately (i.e., underfitting to the data). Too many components leads to an over-fit model with singular covariance matrices. Simulate data from a mixture of two bivariate Gaussian distributions using mvnrnd. mu1 = [1 2]; sigma1 = [2 0; 0 .5]; mu2 = [-3 -5]; sigma2 = [1 0; 0 1]; rng(1); % For reproducibility X = [mvnrnd(mu1,sigma1,1000); mvnrnd(mu2,sigma2,1000)];

Plot the simulated data. scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10 title('Simulated Data')

5-112

Fit Gaussian Mixture Model to Data

Fit a two-component GMM. Use the 'Options' name-value pair argument to display the final output of the fitting algorithm. options = statset('Display','final'); gm = fitgmdist(X,2,'Options',options) 5 iterations, log-likelihood = -7105.71 gm = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: -3.0377 -4.9859 Component 2: Mixing proportion: 0.500000 Mean: 0.9812 2.0563

Plot the pdf of the fitted GMM. gmPDF = @(x,y)reshape(pdf(gm,[x(:) y(:)]),size(x)); hold on h = fcontour(gmPDF,[-8 6]); title('Simulated Data and Contour lines of pdf');

Display the estimates for means, covariances, and mixture proportions 5-113

5

Probability Distributions

ComponentMeans = gm.mu ComponentMeans = 2×2 -3.0377 0.9812

-4.9859 2.0563

ComponentCovariances = gm.Sigma ComponentCovariances = ComponentCovariances(:,:,1) = 1.0132 0.0482

0.0482 0.9796

ComponentCovariances(:,:,2) = 1.9919 0.0127

0.0127 0.5533

MixtureProportions = gm.ComponentProportion MixtureProportions = 1×2 0.5000

0.5000

Fit four models to the data, each with an increasing number of components, and compare the Akaike Information Criterion (AIC) values. AIC = zeros(1,4); gm = cell(1,4); for k = 1:4 gm{k} = fitgmdist(X,k); AIC(k)= gm{k}.AIC; end

Display the number of components that minimizes the AIC value. [minAIC,numComponents] = min(AIC); numComponents numComponents = 2

The two-component model has the smallest AIC value. Display the two-component GMM. gm2 = gm{numComponents} gm2 = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: -3.0377 -4.9859

5-114

Fit Gaussian Mixture Model to Data

Component 2: Mixing proportion: 0.500000 Mean: 0.9812 2.0563

Both the AIC and Bayesian information criteria (BIC) are likelihood-based measures of model fit that include a penalty for complexity (specifically, the number of parameters). You can use them to determine an appropriate number of components for a model when the number of components is unspecified.

See Also fitgmdist | gmdistribution | mvnrnd | random

More About •

“Create Gaussian Mixture Model” on page 5-109



“Simulate Data from Gaussian Mixture Model” on page 5-116



“Cluster Using Gaussian Mixture Model” on page 16-39

5-115

5

Probability Distributions

Simulate Data from Gaussian Mixture Model This example shows how to simulate data from a Gaussian mixture model (GMM) using a fully specified gmdistribution object and the random function. Create a known, two-component GMM object. mu = [1 2;-3 -5]; sigma = cat(3,[2 0;0 .5],[1 0;0 1]); p = ones(1,2)/2; gm = gmdistribution(mu,sigma,p);

Plot the contour of the pdf of the GMM. gmPDF = @(x,y)reshape(pdf(gm,[x(:) y(:)]),size(x)); fcontour(gmPDF,[-10 10]); title('Contour lines of pdf');

Generate 1000 random variates from the GMM. rng('default'); % For reproducibility X = random(gm,1000);

Plot the variates with the pdf contours. hold on scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10 title('Contour lines of pdf and Simulated Data');

5-116

Simulate Data from Gaussian Mixture Model

See Also fitgmdist | gmdistribution | mvnrnd | random

More About •

“Create Gaussian Mixture Model” on page 5-109



“Fit Gaussian Mixture Model to Data” on page 5-112



“Cluster Using Gaussian Mixture Model” on page 16-39

5-117

5

Probability Distributions

Copulas: Generate Correlated Samples In this section... “Determining Dependence Between Simulation Inputs” on page 5-118 “Constructing Dependent Bivariate Distributions” on page 5-121 “Using Rank Correlation Coefficients” on page 5-125 “Using Bivariate Copulas” on page 5-127 “Higher Dimension Copulas” on page 5-133 “Archimedean Copulas” on page 5-134 “Simulating Dependent Multivariate Data Using Copulas” on page 5-135 “Fitting Copulas to Data” on page 5-139 Copulas are functions that describe dependencies among variables, and provide a way to create distributions that model correlated multivariate data. Using a copula, you can construct a multivariate distribution by specifying marginal univariate distributions, and then choose a copula to provide a correlation structure between variables. Bivariate distributions, as well as distributions in higher dimensions, are possible.

Determining Dependence Between Simulation Inputs One of the design decisions for a Monte Carlo simulation is a choice of probability distributions for the random inputs. Selecting a distribution for each individual variable is often straightforward, but deciding what dependencies should exist between the inputs may not be. Ideally, input data to a simulation should reflect what you know about dependence among the real quantities you are modeling. However, there may be little or no information on which to base any dependence in the simulation. In such cases, it is useful to experiment with different possibilities in order to determine the model's sensitivity. It can be difficult to generate random inputs with dependence when they have distributions that are not from a standard multivariate distribution. Further, some of the standard multivariate distributions can model only limited types of dependence. It is always possible to make the inputs independent, and while that is a simple choice, it is not always sensible and can lead to the wrong conclusions. For example, a Monte-Carlo simulation of financial risk could have two random inputs that represent different sources of insurance losses. You could model these inputs as lognormal random variables. A reasonable question to ask is how dependence between these two inputs affects the results of the simulation. Indeed, you might know from real data that the same random conditions affect both sources; ignoring that in the simulation could lead to the wrong conclusions. Generate and Exponentiate Normal Random Variables The lognrnd function simulates independent lognormal random variables. In the following example, the mvnrnd function generates n pairs of independent normal random variables, and then exponentiates them. Notice that the covariance matrix used here is diagonal. n = 1000; sigma = .5; SigmaInd = sigma.^2 .* [1 0; 0 1]

5-118

Copulas: Generate Correlated Samples

SigmaInd = 2×2 0.2500 0

0 0.2500

rng('default'); % For reproducibility ZInd = mvnrnd([0 0],SigmaInd,n); XInd = exp(ZInd); plot(XInd(:,1),XInd(:,2),'.') axis([0 5 0 5]) axis equal xlabel('X1') ylabel('X2')

Dependent bivariate lognormal random variables are also easy to generate using a covariance matrix with nonzero off-diagonal terms. rho = .7; SigmaDep = sigma.^2 .* [1 rho; rho 1] SigmaDep = 2×2 0.2500

0.1750

5-119

5

Probability Distributions

0.1750

0.2500

ZDep = mvnrnd([0 0],SigmaDep,n); XDep = exp(ZDep);

A second scatter plot demonstrates the difference between these two bivariate distributions. plot(XDep(:,1),XDep(:,2),'.') axis([0 5 0 5]) axis equal xlabel('X1') ylabel('X2')

It is clear that there is a tendency in the second data set for large values of X1 to be associated with large values of X2, and similarly for small values. The correlation parameter ρ of the underlying bivariate normal determines this dependence. The conclusions drawn from the simulation could well depend on whether you generate X1 and X2 with dependence. The bivariate lognormal distribution is a simple solution in this case; it easily generalizes to higher dimensions in cases where the marginal distributions are different lognormals. Other multivariate distributions also exist. For example, the multivariate t and the Dirichlet distributions simulate dependent t and beta random variables, respectively. But the list of simple multivariate distributions is not long, and they only apply in cases where the marginals are all in the same family (or even the exact same distributions). This can be a serious limitation in many situations. 5-120

Copulas: Generate Correlated Samples

Constructing Dependent Bivariate Distributions Although the construction discussed in the previous section creates a bivariate lognormal that is simple, it serves to illustrate a method that is more generally applicable. 1

Generate pairs of values from a bivariate normal distribution. There is statistical dependence between these two variables, and each has a normal marginal distribution.

2

Apply a transformation (the exponential function) separately to each variable, changing the marginal distributions into lognormals. The transformed variables still have a statistical dependence.

If a suitable transformation can be found, this method can be generalized to create dependent bivariate random vectors with other marginal distributions. In fact, a general method of constructing such a transformation does exist, although it is not as simple as exponentiation alone. By definition, applying the normal cumulative distribution function (cdf), denoted here by Φ, to a standard normal random variable results in a random variable that is uniform on the interval [0,1]. To see this, if Z has a standard normal distribution, then the cdf of U = Φ(Z) is Pr U ≤ u = Pr Φ(Z) ≤ u = Pr(Z ≤ Φ−1(u) = u and that is the cdf of a Unif(0,1) random variable. Histograms of some simulated normal and transformed values demonstrate that fact: n = 1000; rng default % for reproducibility z = normrnd(0,1,n,1); % generate standard normal data histogram(z,-3.75:.5:3.75,'FaceColor',[.8 .8 1]) % plot the histogram of data xlim([-4 4]) title('1000 Simulated N(0,1) Random Values') xlabel('Z') ylabel('Frequency')

5-121

5

Probability Distributions

u = normcdf(z);

% compute the cdf values of the sample data

figure histogram(u,.05:.1:.95,'FaceColor',[.8 .8 1]) % plot the histogram of the cdf values title('1000 Simulated N(0,1) Values Transformed to Unif(0,1)') xlabel('U') ylabel('Frequency')

5-122

Copulas: Generate Correlated Samples

Borrowing from the theory of univariate random number generation, applying the inverse cdf of any distribution, F, to a Unif(0,1) random variable results in a random variable whose distribution is exactly F (see “Inversion Methods” on page 7-3). The proof is essentially the opposite of the preceding proof for the forward case. Another histogram illustrates the transformation to a gamma distribution: x = gaminv(u,2,1); % transform to gamma values figure histogram(x,.25:.5:9.75,'FaceColor',[.8 .8 1]) % plot the histogram of gamma values title('1000 Simulated N(0,1) Values Transformed to Gamma(2,1)') xlabel('X') ylabel('Frequency')

5-123

5

Probability Distributions

You can apply this two-step transformation to each variable of a standard bivariate normal, creating dependent random variables with arbitrary marginal distributions. Because the transformation works on each component separately, the two resulting random variables need not even have the same marginal distributions. The transformation is defined as: Z = Z1, Z2 ∼ N 0, 0 ,

1 ρ ρ 1

U = Φ Z1 , Φ Z2 X = G1 U1 , G2 U2 where G1 and G2 are inverse cdfs of two possibly different distributions. For example, the following generates random vectors from a bivariate distribution with t5 and Gamma(2,1) marginals: n Z U X

= = = =

1000; rho = .7; mvnrnd([0 0],[1 rho; rho 1],n); normcdf(Z); [gaminv(U(:,1),2,1) tinv(U(:,2),5)];

% draw the scatter plot of data with histograms figure scatterhist(X(:,1),X(:,2),'Direction','out')

5-124

Copulas: Generate Correlated Samples

This plot has histograms alongside a scatter plot to show both the marginal distributions, and the dependence.

Using Rank Correlation Coefficients The correlation parameter, ρ, of the underlying bivariate normal determines the dependence between X1 and X2 in this construction. However, the linear correlation of X1 and X2 is not ρ. For example, in the original lognormal case, a closed form for that correlation is: 2

cor(X1, X2) =

eρσ − 1 2

eσ − 1

which is strictly less than ρ, unless ρ is exactly 1. In more general cases such as the Gamma/t construction, the linear correlation between X1 and X2 is difficult or impossible to express in terms of ρ, but simulations show that the same effect happens. That is because the linear correlation coefficient expresses the linear dependence between random variables, and when nonlinear transformations are applied to those random variables, linear correlation is not preserved. Instead, a rank correlation coefficient, such as Kendall's τ or Spearman's ρ, is more appropriate. Roughly speaking, these rank correlations measure the degree to which large or small values of one random variable associate with large or small values of another. However, unlike the linear correlation coefficient, they measure the association only in terms of ranks. As a consequence, the 5-125

5

Probability Distributions

rank correlation is preserved under any monotonic transformation. In particular, the transformation method just described preserves the rank correlation. Therefore, knowing the rank correlation of the bivariate normal Z exactly determines the rank correlation of the final transformed random variables, X. While the linear correlation coefficient, ρ, is still needed to parameterize the underlying bivariate normal, Kendall's τ or Spearman's ρ are more useful in describing the dependence between random variables, because they are invariant to the choice of marginal distribution. For the bivariate normal, there is a simple one-to-one mapping between Kendall's τ or Spearman's ρ, and the linear correlation coefficient ρ: 2 π τ  =  arcsin ρ     or   ρ = sin τ    π 2 6 ρ π ρs =  arcsin   or   ρ = 2sin ρs π 2 6 The following plot shows the relationship. rho = -1:.01:1; tau = 2.*asin(rho)./pi; rho_s = 6.*asin(rho./2)./pi; plot(rho,tau,'b-','LineWidth',2) hold on plot(rho,rho_s,'g-','LineWidth',2) plot([-1 1],[-1 1],'k:','LineWidth',2) axis([-1 1 -1 1]) xlabel('rho') ylabel('Rank correlation coefficient') legend('Kendall''s {\it\tau}', ... 'Spearman''s {\it\rho_s}', ... 'location','NW')

5-126

Copulas: Generate Correlated Samples

Thus, it is easy to create the desired rank correlation between X1 and X2, regardless of their marginal distributions, by choosing the correct ρ parameter value for the linear correlation between Z1 and Z2. For the multivariate normal distribution, Spearman's rank correlation is almost identical to the linear correlation. However, this is not true once you transform to the final random variables.

Using Bivariate Copulas The first step of the construction described in the previous section defines what is known as a bivariate Gaussian copula. A copula is a multivariate probability distribution, where each random variable has a uniform marginal distribution on the unit interval [0,1]. These variables may be completely independent, deterministically related (e.g., U2 = U1), or anything in between. Because of the possibility for dependence among variables, you can use a copula to construct a new multivariate distribution for dependent variables. By transforming each of the variables in the copula separately using the inversion method, possibly using different cdfs, the resulting distribution can have arbitrary marginal distributions. Such multivariate distributions are often useful in simulations, when you know that the different random inputs are not independent of each other. Statistics and Machine Learning Toolbox functions compute: • Probability density functions (copulapdf) and the cumulative distribution functions (copulacdf) for Gaussian copulas • Rank correlations from linear correlations (copulastat) and vice versa (copulaparam) 5-127

5

Probability Distributions

• Random vectors (copularnd) • Parameters for copulas fit to data (copulafit) For example, use the copularnd function to create scatter plots of random values from a bivariate Gaussian copula for various levels of ρ, to illustrate the range of different dependence structures. The family of bivariate Gaussian copulas is parameterized by the linear correlation matrix: Ρ=

1 ρ ρ 1

U1 and U2 approach linear dependence as ρ approaches ±1, and approach complete independence as ρ approaches zero: The dependence between U1 and U2 is completely separate from the marginal distributions of X1 = G(U1) and X2 = G(U2). X1 and X2 can be given any marginal distributions, and still have the same rank correlation. This is one of the main appeals of copulas—they allow this separate specification of dependence and marginal distribution. You can also compute the pdf (copulapdf) and the cdf (copulacdf) for a copula. For example, these plots show the pdf and cdf for ρ = .8: u1 = linspace(1e-3,1-1e-3,50); u2 = linspace(1e-3,1-1e-3,50); [U1,U2] = meshgrid(u1,u2); Rho = [1 .8; .8 1]; f = copulapdf('t',[U1(:) U2(:)],Rho,5); f = reshape(f,size(U1)); figure() surf(u1,u2,log(f),'FaceColor','interp','EdgeColor','none') view([-15,20]) xlabel('U1') ylabel('U2') zlabel('Probability Density')

5-128

Copulas: Generate Correlated Samples

u1 = linspace(1e-3,1-1e-3,50); u2 = linspace(1e-3,1-1e-3,50); [U1,U2] = meshgrid(u1,u2); F = copulacdf('t',[U1(:) U2(:)],Rho,5); F = reshape(F,size(U1)); figure() surf(u1,u2,F,'FaceColor','interp','EdgeColor','none') view([-15,20]) xlabel('U1') ylabel('U2') zlabel('Cumulative Probability')

5-129

5

Probability Distributions

A different family of copulas can be constructed by starting from a bivariate t distribution and transforming using the corresponding t cdf. The bivariate t distribution is parameterized with P, the linear correlation matrix, and ν, the degrees of freedom. Thus, for example, you can speak of a t1 or a t5 copula, based on the multivariate t with one and five degrees of freedom, respectively. Just as for Gaussian copulas, Statistics and Machine Learning Toolbox functions for t copulas compute: • Probability density functions (copulapdf) and the cumulative distribution functions (copulacdf) for Gaussian copulas • Rank correlations from linear correlations (copulastat) and vice versa (copulaparam) • Random vectors (copularnd) • Parameters for copulas fit to data (copulafit) For example, use the copularnd function to create scatter plots of random values from a bivariate t1 copula for various levels of ρ, to illustrate the range of different dependence structures: n = 500; nu = 1; rng('default') % for reproducibility U = copularnd('t',[1 .8; .8 1],nu,n); subplot(2,2,1) plot(U(:,1),U(:,2),'.') title('{\it\rho} = 0.8')

5-130

Copulas: Generate Correlated Samples

xlabel('U1') ylabel('U2') U = copularnd('t',[1 .1; .1 1],nu,n); subplot(2,2,2) plot(U(:,1),U(:,2),'.') title('{\it\rho} = 0.1') xlabel('U1') ylabel('U2') U = copularnd('t',[1 -.1; -.1 1],nu,n); subplot(2,2,3) plot(U(:,1),U(:,2),'.') title('{\it\rho} = -0.1') xlabel('U1') ylabel('U2') U = copularnd('t',[1 -.8; -.8 1],nu, n); subplot(2,2,4) plot(U(:,1),U(:,2),'.') title('{\it\rho} = -0.8') xlabel('U1') ylabel('U2')

A t copula has uniform marginal distributions for U1 and U2, just as a Gaussian copula does. The rank correlation τ or ρs between components in a t copula is also the same function of ρ as for a Gaussian. However, as these plots demonstrate, a t1 copula differs quite a bit from a Gaussian copula, even 5-131

5

Probability Distributions

when their components have the same rank correlation. The difference is in their dependence structure. Not surprisingly, as the degrees of freedom parameter ν is made larger, a tν copula approaches the corresponding Gaussian copula. As with a Gaussian copula, any marginal distributions can be imposed over a t copula. For example, using a t copula with 1 degree of freedom, you can again generate random vectors from a bivariate distribution with Gamma(2,1) and t5 marginals using copularnd: n = 1000; rho = .7; nu = 1; rng('default') % for reproducibility U = copularnd('t',[1 rho; rho 1],nu,n); X = [gaminv(U(:,1),2,1) tinv(U(:,2),5)]; figure() scatterhist(X(:,1),X(:,2),'Direction','out')

Compared to the bivariate Gamma/t distribution constructed earlier, which was based on a Gaussian copula, the distribution constructed here, based on a t1 copula, has the same marginal distributions and the same rank correlation between variables but a very different dependence structure. This illustrates the fact that multivariate distributions are not uniquely defined by their marginal distributions, or by their correlations. The choice of a particular copula in an application may be based on actual observed data, or different copulas may be used as a way of determining the sensitivity of simulation results to the input distribution. 5-132

Copulas: Generate Correlated Samples

Higher Dimension Copulas The Gaussian and t copulas are known as elliptical copulas. It is easy to generalize elliptical copulas to a higher number of dimensions. For example, simulate data from a trivariate distribution with Gamma(2,1), Beta(2,2), and t5 marginals using a Gaussian copula and copularnd, as follows: n = 1000; Rho = [1 .4 .2; .4 1 -.8; .2 -.8 1]; rng('default') % for reproducibility U = copularnd('Gaussian',Rho,n); X = [gaminv(U(:,1),2,1) betainv(U(:,2),2,2) tinv(U(:,3),5)];

Plot the data. subplot(1,1,1) plot3(X(:,1),X(:,2),X(:,3),'.') grid on view([-55, 15]) xlabel('X1') ylabel('X2') zlabel('X3')

Notice that the relationship between the linear correlation parameter ρ and, for example, Kendall's τ, holds for each entry in the correlation matrix P used here. You can verify that the sample rank correlations of the data are approximately equal to the theoretical values: tauTheoretical = 2.*asin(Rho)./pi

5-133

5

Probability Distributions

tauTheoretical = 3×3 1.0000 0.2620 0.1282

0.2620 1.0000 -0.5903

0.1282 -0.5903 1.0000

tauSample = corr(X,'type','Kendall') tauSample = 3×3 1.0000 0.2581 0.1414

0.2581 1.0000 -0.5790

0.1414 -0.5790 1.0000

Archimedean Copulas Statistics and Machine Learning Toolbox functions are available for three bivariate Archimedean copula families: • Clayton copulas • Frank copulas • Gumbel copulas These are one-parameter families that are defined directly in terms of their cdfs, rather than being defined constructively using a standard multivariate distribution. To compare these three Archimedean copulas to the Gaussian and t bivariate copulas, first use the copulastat function to find the rank correlation for a Gaussian or t copula with linear correlation parameter of 0.8, and then use the copulaparam function to find the Clayton copula parameter that corresponds to that rank correlation: tau = copulastat('Gaussian',.8 ,'type','kendall') tau = 0.5903 alpha = copulaparam('Clayton',tau,'type','kendall') alpha = 2.8820

Finally, plot a random sample from the Clayton copula with copularnd. Repeat the same procedure for the Frank and Gumbel copulas: n = 500; U = copularnd('Clayton',alpha,n); subplot(3,1,1) plot(U(:,1),U(:,2),'.'); title(['Clayton Copula, {\it\alpha} = ',sprintf('%0.2f',alpha)]) xlabel('U1') ylabel('U2') alpha = copulaparam('Frank',tau,'type','kendall'); U = copularnd('Frank',alpha,n); subplot(3,1,2) plot(U(:,1),U(:,2),'.')

5-134

Copulas: Generate Correlated Samples

title(['Frank Copula, {\it\alpha} = ',sprintf('%0.2f',alpha)]) xlabel('U1') ylabel('U2') alpha = copulaparam('Gumbel',tau,'type','kendall'); U = copularnd('Gumbel',alpha,n); subplot(3,1,3) plot(U(:,1),U(:,2),'.') title(['Gumbel Copula, {\it\alpha} = ',sprintf('%0.2f',alpha)]) xlabel('U1') ylabel('U2')

Simulating Dependent Multivariate Data Using Copulas To simulate dependent multivariate data using a copula, you must specify each of the following: • The copula family (and any shape parameters) • The rank correlations among variables • Marginal distributions for each variable Suppose you have return data for two stocks and want to run a Monte Carlo simulation with inputs that follow the same distributions as the data: load stockreturns nobs = size(stocks,1);

5-135

5

Probability Distributions

subplot(2,1,1) histogram(stocks(:,1),10,'FaceColor',[.8 .8 1]) xlim([-3.5 3.5]) xlabel('X1') ylabel('Frequency') subplot(2,1,2) histogram(stocks(:,2),10,'FaceColor',[.8 .8 1]) xlim([-3.5 3.5]) xlabel('X2') ylabel('Frequency')

You could fit a parametric model separately to each dataset, and use those estimates as the marginal distributions. However, a parametric model may not be sufficiently flexible. Instead, you can use a nonparametric model to transform to the marginal distributions. All that is needed is a way to compute the inverse cdf for the nonparametric model. The simplest nonparametric model is the empirical cdf, as computed by the ecdf function. For a discrete marginal distribution, this is appropriate. However, for a continuous distribution, use a model that is smoother than the step function computed by ecdf. One way to do that is to estimate the empirical cdf and interpolate between the midpoints of the steps with a piecewise linear function. Another way is to use kernel smoothing with ksdensity. For example, compare the empirical cdf to a kernel smoothed cdf estimate for the first variable: [Fi,xi] = ecdf(stocks(:,1)); figure()

5-136

Copulas: Generate Correlated Samples

stairs(xi,Fi,'b','LineWidth',2) hold on Fi_sm = ksdensity(stocks(:,1),xi,'function','cdf','width',.15); plot(xi,Fi_sm,'r-','LineWidth',1.5) xlabel('X1') ylabel('Cumulative Probability') legend('Empirical','Smoothed','Location','NW') grid on

For the simulation, experiment with different copulas and correlations. Here, you will use a bivariate t copula with a fairly small degrees of freedom parameter. For the correlation parameter, you can compute the rank correlation of the data. nu = 5; tau = corr(stocks(:,1),stocks(:,2),'type','kendall') tau = 0.5180

Find the corresponding linear correlation parameter for the t copula using copulaparam. rho = copulaparam('t', tau, nu, 'type','kendall') rho = 0.7268

5-137

5

Probability Distributions

Next, use copularnd to generate random values from the t copula and transform using the nonparametric inverse cdfs. The ksdensity function allows you to make a kernel estimate of distribution and evaluate the inverse cdf at the copula points all in one step: n = 1000; U = copularnd('t',[1 rho; rho 1],nu,n); X1 = ksdensity(stocks(:,1),U(:,1),... 'function','icdf','width',.15); X2 = ksdensity(stocks(:,2),U(:,2),... 'function','icdf','width',.15);

Alternatively, when you have a large amount of data or need to simulate more than one set of values, it may be more efficient to compute the inverse cdf over a grid of values in the interval (0,1) and use interpolation to evaluate it at the copula points: p = linspace(0.00001,0.99999,1000); G1 = ksdensity(stocks(:,1),p,'function','icdf','width',0.15); X1 = interp1(p,G1,U(:,1),'spline'); G2 = ksdensity(stocks(:,2),p,'function','icdf','width',0.15); X2 = interp1(p,G2,U(:,2),'spline'); scatterhist(X1,X2,'Direction','out')

The marginal histograms of the simulated data are a smoothed version of the histograms for the original data. The amount of smoothing is controlled by the bandwidth input to ksdensity.

5-138

Copulas: Generate Correlated Samples

Fitting Copulas to Data This example shows how to use copulafit to calibrate copulas with data. To generate data Xsim with a distribution "just like" (in terms of marginal distributions and correlations) the distribution of data in the matrix X , you need to fit marginal distributions to the columns of X , use appropriate cdf functions to transform X to U , so that U has values between 0 and 1, use copulafit to fit a copula to U , generate new data Usim from the copula, and use appropriate inverse cdf functions to transform Usim to Xsim . Load and plot the simulated stock return data. load stockreturns x = stocks(:,1); y = stocks(:,2); scatterhist(x,y,'Direction','out')

Transform the data to the copula scale (unit square) using a kernel estimator of the cumulative distribution function. u = ksdensity(x,x,'function','cdf'); v = ksdensity(y,y,'function','cdf'); scatterhist(u,v,'Direction','out') xlabel('u') ylabel('v')

5-139

5

Probability Distributions

Fit a t copula. [Rho,nu] = copulafit('t',[u v],'Method','ApproximateML') Rho = 2×2 1.0000 0.7220

0.7220 1.0000

nu = 3.0531e+06

Generate a random sample from the t copula. r = copularnd('t',Rho,nu,1000); u1 = r(:,1); v1 = r(:,2); scatterhist(u1,v1,'Direction','out') xlabel('u') ylabel('v') set(get(gca,'children'),'marker','.')

5-140

Copulas: Generate Correlated Samples

Transform the random sample back to the original scale of the data. x1 = ksdensity(x,u1,'function','icdf'); y1 = ksdensity(y,v1,'function','icdf'); scatterhist(x1,y1,'Direction','out') set(get(gca,'children'),'marker','.')

5-141

5

Probability Distributions

As the example illustrates, copulas integrate naturally with other distribution fitting functions.

5-142

6 Gaussian Processes • “Gaussian Process Regression Models” on page 6-2 • “Kernel (Covariance) Function Options” on page 6-5 • “Exact GPR Method” on page 6-9 • “Subset of Data Approximation for GPR Models” on page 6-13 • “Subset of Regressors Approximation for GPR Models” on page 6-14 • “Fully Independent Conditional Approximation for GPR Models” on page 6-18 • “Block Coordinate Descent Approximation for GPR Models” on page 6-21

6

Gaussian Processes

Gaussian Process Regression Models Gaussian process regression (GPR) models are nonparametric kernel-based probabilistic models. You can train a GPR model using the fitrgp function. Consider the training set (xi, yi); i = 1, 2, ..., n , where xi ∈ ℝd and yi ∈ ℝ, drawn from an unknown distribution. A GPR model addresses the question of predicting the value of a response variable ynew, given the new input vector xnew, and the training data. A linear regression model is of the form y = xT β + ε, where ε ∼ N(0, σ2). The error variance σ2 and the coefficients β are estimated from the data. A GPR model explains the response by introducing latent variables, f xi , i = 1, 2, ..., n, from a Gaussian process (GP), and explicit basis functions, h. The covariance function of the latent variables captures the smoothness of the response and basis functions project the inputs x into a p-dimensional feature space. A GP is a set of random variables, such that any finite number of them have a joint Gaussian distribution. If f x , x ∈ ℝd is a GP, then given n observations x1, x2, ..., xn, the joint distribution of the random variables f (x1), f (x2), ..., f (xn) is Gaussian. A GP is defined by its mean function m x and covariance function, k x, x′ . That is, if f x , x ∈ ℝd is a Gaussian process, then E f x = m x and Cov f x , f x′ = E f x − m x

f x′ − m x′

= k x, x′ .

Now consider the following model. T

h(x) β + f (x), where f x GP 0, k x, x′ , that is f(x) are from a zero mean GP with covariance function, k x, x′ . h(x) are a set of basis functions that transform the original feature vector x in Rd into a new feature vector h(x) in Rp. β is a p-by-1 vector of basis function coefficients. This model represents a GPR model. An instance of response y can be modeled as P yi f xi , xi  

T

N yi h xi β + f xi , σ2

Hence, a GPR model is a probabilistic model. There is a latent variable f(xi) introduced for each observation xi, which makes the GPR model nonparametric. In vector form, this model is equivalent to P(y f , X) N(y Hβ + f , σ2I), where x1T X=

x2T ⋮ xnT

,

y=

y1

h x1T

y2

x2T

⋮ yn

,

H=

h

⋮ h xnT

f x1 ,

f =

f x2 ⋮ f xn

.

The joint distribution of latent variables f x1 , f x2 , ..., f xn in the GPR model is as follows: P(f X) N f 0, K X, X , 6-2

Gaussian Process Regression Models

close to a linear regression model, where K X, X looks as follows: k x1, x1 k x1, x2 ⋯ k x1, xn K X, X =

k x2, x1 k x2, x2 ⋯ k x2, xn ⋮ ⋮ ⋮ ⋮ k xn, x1 k xn, x2 ⋯ k xn, xn

.

The covariance function k x, x′ is usually parameterized by a set of kernel parameters or hyperparameters, θ. Often k x, x′ is written as k x, x′ θ to explicitly indicate the dependence on θ. fitrgp estimates the basis function coefficients, β, the noise variance, σ2, and the hyperparameters, θ, of the kernel function from the data while training the GPR model. You can specify the basis function, the kernel (covariance) function, and the initial values for the parameters. Because a GPR model is probabilistic, it is possible to compute the prediction intervals using the trained model (see predict and resubPredict). Consider some data observed from the function g(x) = x*sin(x), and assume that they are noise free. The subplot on the left in the following figure illustrates the observations, the GPR fit, and the actual function. It is more realistic that the observed values are not the exact function values, but a noisy realization of them. The subplot on the right illustrates this case. When observations are noise free (as in the subplot on the left), the GPR fit crosses the observations, and the standard deviation of the predicted response is zero. Hence, you do not see prediction intervals around these values.

You can also compute the regression error using the trained GPR model (see loss and resubLoss).

6-3

6

Gaussian Processes

References [1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.

See Also RegressionGP | fitrgp

More About

6-4



“Exact GPR Method” on page 6-9



“Subset of Data Approximation for GPR Models” on page 6-13



“Subset of Regressors Approximation for GPR Models” on page 6-14



“Fully Independent Conditional Approximation for GPR Models” on page 6-18



“Block Coordinate Descent Approximation for GPR Models” on page 6-21

Kernel (Covariance) Function Options

Kernel (Covariance) Function Options In supervised learning, it is expected that the points with similar predictor values xi, naturally have close response (target) values yi. In Gaussian processes, the covariance function expresses this similarity [1]. It specifies the covariance between the two latent variables f xi and f x j , where both xi and x j are d-by-1 vectors. In other words, it determines how the response at one point xi is affected by responses at other points x j, i ≠ j, i = 1, 2, ..., n. The covariance function k xi, x j can be defined by various kernel functions. It can be parameterized in terms of the kernel parameters in vector θ. Hence, it is possible to express the covariance function as k xi, x j θ . For many standard kernel functions, the kernel parameters are based on the signal standard deviation σf and the characteristic length scale σl. The characteristic length scales briefly define how far apart the input values xi can be for the response values to become uncorrelated. Both σl and σf need to be greater than 0, and this can be enforced by the unconstrained parametrization vector θ, such that θ1 = logσl,

θ2 = logσf .

The built-in kernel (covariance) functions with same length scale for each predictor are: • Squared Exponential Kernel This is one of the most commonly used covariance functions and is the default option for fitrgp. The squared exponential kernel function is defined as k xi, x j θ = σ2f exp −

T

1 (xi −  x j) (xi −  x j) . 2 σ2 l

where σl is the characteristic length scale, and σf is the signal standard deviation. • Exponential Kernel You can specify the exponential kernel function using the 'KernelFunction','exponential' name-value pair argument. This covariance function is defined by k(xi, x j θ) = σ2f exp −

r , σl

where σl is the characteristic length scale and T

r = (xi −  x j) (xi −  x j) is the Euclidean distance between xi and x j. • Matern 3/2 You can specify the Matern 3/2 kernel function using the 'KernelFunction','matern32' name-value pair argument. This covariance function is defined by k(xi, x j θ) = σ2f 1 +

3r 3r exp − , σl σl

where 6-5

6

Gaussian Processes

T

r = (xi −  x j) (xi −  x j) is the Euclidean distance between xi and x j. • Matern 5/2 You can specify the Matern 5/2 kernel function using the 'KernelFunction','matern52' name-value pair argument. The Matern 5/2 covariance function is defined as k(xi, x j) = σ2f 1 +

5r 5r 2 5r exp − + , 2 σl σ l 3σl

where T

r = (xi −  x j) (xi −  x j) is the Euclidean distance between xi and x j. • Rational Quadratic Kernel You can specify the rational quadratic kernel function using the 'KernelFunction','rationalquadratic' name-value pair argument. This covariance function is defined by k(xi, x j θ) = σ2f 1 +

r2 2ασl2

−α

,

where σl is the characteristic length scale, α is a positive-valued scale-mixture parameter, and T

r = (xi −  x j) (xi −  x j) is the Euclidean distance between xi and x j. It is possible to use a separate length scale σm for each predictor m, m = 1, 2, ...,d. The built-in kernel (covariance) functions with a separate length scale for each predictor implement automatic relevance determination (ARD) [2]. The unconstrained parametrization θ in this case is θm = logσm,

for m = 1, 2, ..., d

θd + 1 = logσf . The built-in kernel (covariance) functions with separate length scale for each predictor are: • ARD Squared Exponential Kernel You can specify this kernel function using the 'KernelFunction','ardsquaredexponential' name-value pair argument. This covariance function is the squared exponential kernel function, with a separate length scale for each predictor. It is defined as k(xi, x j θ) = σ2f exp −

• ARD Exponential Kernel 6-6

2

d (x − x ) 1 im jm ∑ 2 2m = 1 σm

.

Kernel (Covariance) Function Options

You can specify this kernel function using the 'KernelFunction','ardexponential' namevalue pair argument. This covariance function is the exponential kernel function, with a separate length scale for each predictor. It is defined as k(xi, x j θ) = σ2f exp −r , where r=

2

d

(xim − x jm)

m=1

2 σm



.

• ARD Matern 3/2 You can specify this kernel function using the 'KernelFunction','ardmatern32' name-value pair argument. This covariance function is the Matern 3/2 kernel function, with a different length scale for each predictor. It is defined as k(xi, x j θ) = σ2f 1 + 3 r exp − 3 r , where r=

2

d

(xim − x jm)

m=1

2 σm



.

• ARD Matern 5/2 You can specify this kernel function using the 'KernelFunction','ardmatern52' name-value pair argument. This covariance function is the Matern 5/2 kernel function, with a different length scale for each predictor. It is defined as k(xi, x j θ) = σ2f 1 + 5 r +

5 2 r exp − 5 r , 3

where r=

2

d

(xim − x jm)

m=1

2 σm



.

• ARD Rational Quadratic Kernel You can specify this kernel function using the 'KernelFunction','ardrationalquadratic' name-value pair argument. This covariance function is the rational quadratic kernel function, with a separate length scale for each predictor. It is defined as k(xi, x j θ) = σ2f 1 +

d

2 −α

(xim − x jm) 1 2 2α m∑ σm =1

.

You can specify the kernel function using the KernelFunction name-value pair argument in a call to fitrgp. You can either specify one of the built-in kernel parameter options, or specify a custom function. When providing the initial kernel parameter values for a built-in kernel function, input the initial values for signal standard deviation and the characteristic length scale(s) as a numeric vector. When providing the initial kernel parameter values for a custom kernel function, input the initial values the unconstrained parametrization vector θ. fitrgp uses analytical derivatives to estimate 6-7

6

Gaussian Processes

parameters when using a built-in kernel function, whereas when using a custom kernel function it uses numerical derivatives.

References [1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006. [2] Neal, R. M. Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics, 118, 1996.

See Also RegressionGP | fitrgp

Related Examples

6-8



“Use Separate Length Scales for Predictors” on page 32-1908



“Fit GPR Model Using Custom Kernel Function” on page 32-1918



“Impact of Specifying Initial Kernel Parameter Values” on page 32-1906

Exact GPR Method

Exact GPR Method In this section... “Parameter Estimation” on page 6-9 “Prediction” on page 6-10 “Computational Complexity of Exact Parameter Estimation and Prediction” on page 6-12 An instance of response y from a Gaussian process regression (GPR) model on page 6-2 can be modeled as P yi f xi , xi  

T

N yi h xi β + f xi , σ2

Hence, making predictions for new data from a GPR model requires: • Knowledge of the coefficient vector, β, of fixed basis functions • Ability to evaluate the covariance function k x, x′ θ for arbitrary x and x′, given the kernel parameters or hyperparameters, θ. • Knowledge of the noise variance σ2 that appears in the density P yi f xi , xi That is, one needs first to estimate β, θ, and σ2 from the data X, y .

Parameter Estimation One approach for estimating the parameters β, θ, and σ2 of a GPR model is by maximizing the 2

likelihood P y X as a function of β, θ, and σ2[1]. That is, if β , θ , and σ are the estimates of β, θ, and σ2, respectively, then: 2

β , θ , σ = arg maxlogP y X, β, θ, σ2 . β, θ, σ2

Because P y X = P y X, β, θ, σ2 = N y Hβ, K X, X θ + σ2In , the marginal log likelihood function is as follows: −1 1 T y − Hβ K X, X θ + σ2In y − Hβ 2 n 1 − log2π − log K X, X θ + σ2In . 2 2

logP y X, β, θ, σ2 = −

where H is the vector of explicit basis functions, and K X, X θ is the covariance function matrix (for more information, see “Gaussian Process Regression Models” on page 6-2). To estimate the parameters, the software first computes β θ, σ2 , which maximizes the log likelihood function with respect to β for given θ and σ2. It then uses this estimate to compute the β-profiled likelihood: log P y X, β θ, σ2 , θ, σ2

.

6-9

6

Gaussian Processes

The estimate of β for given θ, and σ2 is β θ, σ2 =  HT K X, X θ + σ2In

−1

 H

−1

HT K X, X θ + σ2In

−1

 y .

Then, the β-profiled log likelihood is given by T 1 y − Hβ θ, σ2 K X, X θ + σ2In 2 n 1 − log2π − log K X, X θ + σ2In 2 2

logP y X, β (θ, σ2), θ, σ2 = −

−1

y − Hβ θ, σ2

The software then maximizes the β-profiled log-likelihood over θ, and σ2 to find their estimates.

Prediction Making probabilistic predictions from a GPR model with known parameters requires the density P ynew y, X, xnew . Using the definition of conditional probabilities, one can write: P ynew y, X, xnew =

P ynew, y X, xnew . P y X, xnew

To find the joint density in the numerator, it is necessary to introduce the latent variables f new and f corresponding to ynew, and y, respectively. Then, it is possible to use the joint distribution for ynew, y, f new, and f to compute P ynew, y X, xnew :

∫∫ P y = ∫∫ P y

P ynew, y X, xnew =

new, y, f new, f

X, xnew df df new

new, y f new, f , X, xnew

P f new, f X, xnew df df new .

Gaussian process models assume that each response yi only depends on the corresponding latent variable f i and the feature vector xi. Writing P ynew, y f new, f , X, xnew as a product of conditional densities and based on this assumption produces: P ynew, y f new, f , X, xnew = P ynew f new, xnew

n



i=1

P yi f xi , xi .

After integrating with respect to ynew, the result only depends on f and X : P y f, X =

n



i=1

P yi f i, xi =

n



i=1

T

N yi h xi β + f i, σ2 .

Hence, P ynew,  y f new,  f ,  X,  xnew = P ynew f new,  xnew P y f , X . Again using the definition of conditional probabilities, P f new, f X, xnew = P f new f , X, xnew * P f X, xnew , 6-10

Exact GPR Method

it is possible to write P ynew, y X, xnew as follows: P ynew, y X, xnew =

∫∫ P y

new f new,  xnew

P y f , X P f new f , X, xnew P f X, xnew df df new .

Using the facts that P f X, xnew = P f X and P y f , X P f X = P y, f X = P f y, X P y X , one can rewrite P ynew, y X, xnew as follows: P ynew, y X, xnew = P y X

∫∫ P y

new f new,  xnew

P f y, X P f new f , X, xnew df df new .

It is also possible to show that P y X, xnew = P y X . Hence, the required density P ynew y, X, xnew is: P ynew y, X, xnew = =

P ynew, y X, xnew P ynew, y X, xnew = P y X, xnew Py X

∫∫⚬P y

new f new,  xnew 1

P f y, X ⚬ P f new f , X, xnew df df new . ⚬ 2

3

It can be shown that T

1

2 P ynew f new, xnew = N ynew h(xnew) β + f new, σnew

2

P f y, X = N f

3

T , X K X, X P f new f , X, xnew = N f new K xnew

−1

T , X  K X, X Δ = k xnew, xnew − K xnew

−1

where

1 In + K X, X σ2 σ2

−1

−1

In

y − Hβ ,

σ2

−1

+ K(X, X)

−1

f, Δ , T K X, xnew .

After the integration and required algebra, the density of the new response ynew at a new point xnew, given y, X is found as T

2 P ynew y, X, xnew = N ynew h(xnew) β + μ, σnew +Σ ,

where T , X K(X, X) + σ2I μ = K xnew n ⚬

−1

y − Hβ

α

and T , X K X, X + σ2I Σ = k xnew, xnew − K xnew n

−1

T K X, xnew .

6-11

6

Gaussian Processes

The expected value of prediction ynew at a new point xnew given y, X , and parameters β, θ, and σ2 is T

T ,X θ α E ynew y,  X, xnew, β, θ, σ2 =  h xnew β +  K xnew T

=  h xnew β +

n



i=1

αik xnew, xi θ ,

where α = K X, X θ + σ2In

−1

y − Hβ .

Computational Complexity of Exact Parameter Estimation and Prediction Training a GPR model with the exact method (when FitMethod is 'Exact') requires the inversion of an n-by-n kernel matrix K X, X . The memory requirement for this step scales as O(n^2) since K X, X must be stored in memory. One evaluation of logP y X scales as O(n^3). Therefore, the computational complexity is O(k*n^3), where k is the number of function evaluations needed for maximization and n is the number of observations. Making predictions on new data involves the computation of α . If prediction intervals are desired, this step could also involve the computation and storage of the Cholesky factor of K X, X + σ2In for later use. The computational complexity of this step using the direct computation of α is O(n^3) and the memory requirement is O(n^2). Hence, for large n, estimation of parameters or computing predictions can be very expensive. The approximation methods usually involve rearranging the computation so as to avoid the inversion of an n-by-n matrix. For the available approximation methods, please see the related links at the bottom of the page.

References [1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.

See Also fitrgp | predict

More About

6-12



“Gaussian Process Regression Models” on page 6-2



“Subset of Data Approximation for GPR Models” on page 6-13



“Subset of Regressors Approximation for GPR Models” on page 6-14



“Fully Independent Conditional Approximation for GPR Models” on page 6-18



“Block Coordinate Descent Approximation for GPR Models” on page 6-21

Subset of Data Approximation for GPR Models

Subset of Data Approximation for GPR Models Training a GPR model with the exact method (when FitMethod is 'Exact') requires the inversion of an n-by-n matrix. Therefore, the computational complexity is O(k*n^3), where k is the number of function evaluations required for estimating β, θ, and σ2, and n is the number of observations. For large n, estimation of parameters or computing predictions can be very expensive. One simple way to solve the computational complexity problem with large data sets is to select m < n observations out of n and then apply exact GPR model on page 6-9 to these m points to estimate β, θ, and σ2 while ignoring the other (n – m) points. This smaller subset is known as the active set. And this approximation method is called the Subset of Data (SD) method. The computational complexity when using SD method is O(km^3), where k is the number of function evaluations and m is the active set size. The storage requirements are O(m^2) since only a part of the full kernel matrix K X, X θ needs to be stored in memory. You can specify the SD method for parameter estimation by using the 'FitMethod','sd' namevalue pair argument in the call to fitrgp. To specify the SD method for prediction, use the 'PredictMethod','sd' name-value pair argument. For estimating parameters using the exact GPR model, see parameter estimation using the exact GPR method on page 6-9. For making predictions using the exact GPR model, see prediction using the exact GPR method on page 6-10.

See Also fitrgp | predict

More About •

“Gaussian Process Regression Models” on page 6-2



“Exact GPR Method” on page 6-9



“Subset of Regressors Approximation for GPR Models” on page 6-14



“Fully Independent Conditional Approximation for GPR Models” on page 6-18



“Block Coordinate Descent Approximation for GPR Models” on page 6-21

6-13

6

Gaussian Processes

Subset of Regressors Approximation for GPR Models In this section... “Approximating the Kernel Function” on page 6-14 “Parameter Estimation” on page 6-15 “Prediction” on page 6-15 “Predictive Variance Problem” on page 6-16 The subset of regressors (SR) approximation method consists of replacing the kernel function k x, xr θ in the exact GPR method on page 6-9 by its approximation k SR x, xr θ, A , given the active set A ⊂ N = 1, 2, ..., n . You can specify the SR method for parameter estimation by using the 'FitMethod','sr' name-value pair argument in the call to fitrgp. For prediction using SR, you can use the 'PredictMethod','sr' name-value pair argument in the call to fitrgp.

Approximating the Kernel Function For the exact GPR model on page 6-10, the expected prediction in GPR depends on the set of N functions SN = k x, xi θ , i = 1, 2, …, n , where N = 1, 2, ..., n is the set of indices of all observations, and n is the total number of observations. The idea is to approximate the span of these functions by a smaller set of functions, SA, where A ⊂ N = 1, 2, ..., n is the subset of indices of points selected to be in the active set. Consider SA = k x, x j θ , j ∈ A . The aim is to approximate the elements of SN as linear combinations of the elements of SA. Suppose the approximation to k x, xr θ using the functions in SA is as follows: k x, xr θ =



j∈A

c jr k x, x j θ ,

where c jr ∈ ℝ are the coefficients of the linear combination for approximating k x, xr θ . Suppose C is the matrix that contains all the coefficients c jr . Then, C, is a A × n matrix such that C( j, r) = c jr . The software finds the best approximation to the elements of SN using the active set A ⊂ N = 1, 2, ..., n by minimizing the error function E A, C =

n



r=1

k x, xr θ − k x, xr θ

2 ℋ,

where ℋ is the Reproducing Kernel Hilbert Spaces (RKHS) associated with the kernel function k [1], [2]. The coefficient matrix that minimizes E A, C is C A =  K XA, XA θ

−1

K XA, X θ ,

and an approximation to the kernel function using the elements in the active set A ⊂ N = 1, 2, ..., n is k x, xr θ =

6-14



j∈A

c jr k x, x j θ =  K xT , XA θ C : , r .

Subset of Regressors Approximation for GPR Models

The SR approximation to the kernel function using the active set A ⊂ N = 1, 2, ..., n is defined as: k SR x, xr θ, A =  K xT , XA θ C A(: , r) = K xT , XA θ K XA, XA θ

−1

K XA, xrT θ

and the SR approximation to K X, X θ is: K SR X, X θ, A =   K X, XA θ  K XA, XA θ

−1

 K XA, X θ .

Parameter Estimation Replacing K X, X θ by K SR X, X θ, A in the marginal log likelihood function produces its SR approximation: −1 1 T y − Hβ K SR X, X θ, A + σ2In y − Hβ 2 N 1 − log2π − log K SR X, X θ, A + σ2In 2 2

logPSR y X, β, θ, σ2, A = −

As in the exact method on page 6-9, the software estimates the parameters by first computing β θ, σ2 , the optimal estimate of β, given θ and σ2. Then it estimates θ, and σ2 using the β-profiled marginal log likelihood. The SR estimate to β for given θ, and σ2 is: β SR θ, σ2, A = HT K SR X, X θ, A + σ2In ⚬

−1

−1

H

HT K SR X, X θ, A + σ2In ⚬

−1

y,

**

*

where K SR X, X θ, A + σ2In AA = K XA, XA θ + * =

−1

=

IN σ2



K X, XA θ σ2

K XA, X θ K X, XA θ σ2

AA−1

K XA, X θ σ2

,

,

T HT H H K X, XA θ −1 K XA, X θ H − AA , σ2 σ2 σ2

T HT y H K X, XA θ −1 K XA, X θ y AA ** = 2 − . σ2 σ σ2

And the SR approximation to the β-profiled marginal log likelihood is: logPSR y X, β SR θ, σ2, A , θ, σ2, A = T 1 y − Hβ SR θ, σ2, A K SR X, X θ, A + σ2In 2 N 1 − log2π − log K SR X, X θ, A + σ2In . 2 2



−1

y − Hβ SR θ, σ2, A

Prediction The SR approximation to the distribution of ynew given y, X , xnew is 6-15

6

Gaussian Processes

T

2 P ynew y, X, xnew = N ynew h xnew β + μSR, σnew + ΣSR ,

where μSR and ΣSR are the SR approximations to μ and Σ shown in prediction using the exact GPR method on page 6-10.

μSR and ΣSR are obtained by replacing k x, xr θ by its SR approximation k SR x, xr θ, A in μ and Σ, respectively. That is, T , X θ, A K μSR = K SR xnew X, X θ, A + σ2 IN ⚬ ⚬ SR 1

−1

y − Hβ .

2

Since −1

T , X θ  K X , X θ 1 = K xnew A A A

2 =

IN σ2



K X, XA θ σ2

K XA, X θ ,

 K XA, XA θ +

and from the fact that IN −  B  A +  B

−1

T ,X θ K X ,X θ + μSR =  K xnew A A A

K XA, X θ  K X, XA θ

−1 K

XA, X θ

σ2

=  A  A +  B

σ2



−1

, μSR can be written as

K XA, X θ K X, XA θ

−1 K

σ2

XA, X θ σ2

y − Hβ .

Similarly, ΣSR is derived as follows: T , X θ, A K 2 ΣSR = k K SR xnew SR X, X θ, A + σ IN ⚬ SR xnew, xnew θ, A − ⚬ ⚬ * **

***

−1

T K X, xnew θ, A . ⚬ SR ****

Because T ,X θ K X ,X θ *  = K xnew A A A

−1

T ,X θ K X ,X θ * * = K xnew A A A

T K XA,  xnew θ,

−1

K XA, X θ ,

* * * = 2  in the equation of μSR, * * * *   =  K X, XA θ K XA, XA θ

−1

T K XA,  xnew θ,

ΣSR is found as follows: T ,X θ ∑SR = K xnew A

 K XA, XA θ +

K XA, X θ  K X, XA θ ) σ2

−1

T K XA,  xnew θ .

Predictive Variance Problem One of the disadvantages of the SR method is that it can give unreasonably small predictive variances when making predictions in a region far away from the chosen active set A ⊂ N = 1, 2, ..., n . 6-16

Subset of Regressors Approximation for GPR Models

Consider making a prediction at a new point xnew that is far away from the training set X . In other T , X θ ≈ 0. words, assume that K xnew

For exact GPR, the posterior distribution of f new given y, X and xnew would be Normal with mean μ = 0 and variance Σ = k xnew, xnew θ . This value is correct in the sense that, if xnew is far from X , then the data X, y does not supply any new information about f new and so the posterior distribution of f new given y, X , and xnew should reduce to the prior distribution f new given xnew, which is a Normal distribution with mean 0 and variance k xnew, xnew θ . For the SR approximation, if xnew is far away from X (and hence also far away from XA), then μSR = 0 and ΣSR = 0. Thus in this extreme case, μSR agrees with μ from exact GPR, but ΣSR is unreasonably small compared to Σ from exact GPR. The fully independent conditional approximation method on page 6-18 can help avoid this problem.

References [1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006. [2] Smola, A. J. and B. Schökopf. Sparse greedy matrix approximation for machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

See Also fitrgp | predict

More About •

“Gaussian Process Regression Models” on page 6-2



“Exact GPR Method” on page 6-9



“Subset of Data Approximation for GPR Models” on page 6-13



“Fully Independent Conditional Approximation for GPR Models” on page 6-18



“Block Coordinate Descent Approximation for GPR Models” on page 6-21

6-17

6

Gaussian Processes

Fully Independent Conditional Approximation for GPR Models In this section... “Approximating the Kernel Function” on page 6-18 “Parameter Estimation” on page 6-18 “Prediction” on page 6-19 The fully independent conditional (FIC) approximation[1] is a way of systematically approximating the true GPR kernel function in a way that avoids the predictive variance problem of the SR approximation on page 6-16 while still maintaining a valid Gaussian process. You can specify the FIC method for parameter estimation by using the 'FitMethod','fic' name-value pair argument in the call to fitrgp. For prediction using FIC, you can use the 'PredictMethod','fic' name-value pair argument in the call to fitrgp.

Approximating the Kernel Function The FIC approximation to k xi, x j θ for active set A ⊂ N = 1, 2, ..., n is given by: k FIC xi, x j θ, A = k SR xi, x j θ, A + δi j k xi, x j θ − k SR xi, x j θ, A , δi j =

1, if i = j, 0 if i ≠ j .

That is, the FIC approximation is equal to the SR approximation if i ≠ j. For i = j, the software uses the exact kernel value rather than an approximation. Define an n-by-n diagonal matrix Ω X θ, A as follows: Ω X θ, A

ij

= δi j k xi, x j θ − k SR xi, x j θ, A =

k xi, x j θ − k SR xi, x j θ, A if i = j, 0

if i ≠ j .

The FIC approximation to K X, X θ is then given by: K FIC X, X θ, A = K SR X, X θ, A +  Ω X θ, A =  K X, XA θ K XA, XA θ

−1

K XA, X θ + Ω X θ, A .

Parameter Estimation Replacing K X, X θ by K FIC X, X θ, A in the marginal log likelihood function produces its FIC approximation: −1 1 T y − Hβ K FIC X, X θ, A + σ2In y − Hβ 2 N 1 − log2π − log K FIC X, X θ, A + σ2In . 2 2

logPFIC y X, β, θ, σ2, A = −

6-18

Fully Independent Conditional Approximation for GPR Models

As in the exact method on page 6-9, the software estimates the parameters by first computing β θ, σ2 , the optimal estimate of β, given θ and σ2. Then it estimates θ, and σ2 using the β-profiled marginal log likelihood. The FIC estimate to β for given θ, and σ2 is β FIC θ, σ2, A = HT K FIC X, X θ, A + σ2 IN ⚬

−1

−1

HT K FIC X, X θ, A + σ2 IN ⚬

H

−1

* * = HT Λ θ, σ2, A

H − HT Λ θ, σ2, A

−1

−1

y − HT Λ θ, σ2, A

BA = K XA, XA θ + K XA, X θ Λ θ, σ2, A

−1

K X, XA θ BA−1K XA, X θ Λ θ, σ2, A

−1

K X, XA θ BA−1K XA, X θ Λ θ, σ2, A

−1

y,

**

*

* = HT Λ θ, σ2, A

−1

H,

−1

y,

K X, XA θ ,

Λ θ, σ2, A = Ω X θ, A + σ2In . Using β FIC θ, σ2, A , the β-profiled marginal log likelihood for FIC approximation is: logPFIC y X, β FIC θ, σ2, A , θ, σ2, A = −

1 y − Hβ FIC θ, σ2, A 2



N 1 log2π − log K FIC X, X θ, A + σ2IN , 2 2

T

K FIC X, X θ, A + σ2IN

−1

y − Hβ FIC θ, σ2, A

where K FIC X, X θ, A + σ2IN = Λ θ, σ2, A

−1

−1

− Λ θ, σ2, A

−1

K X, XA θ BA−1K XA, X θ Λ θ, σ2, A

−1

,

log K FIC X, X θ, A + σ2IN = log Λ θ, σ2, A + log BA − log K XA, XA θ .

Prediction The FIC approximation to the distribution of ynew given y, X , xnew is T

2 + ΣFIC , P ynew y, X, xnew = N ynew h xnew β + μFIC, σnew

where μFIC and ΣFIC are the FIC approximations to μ and Σ given in prediction using exact GPR method on page 6-10. As in the SR case, μFIC and ΣFIC are obtained by replacing all occurrences of the true kernel with its FIC approximation. The final forms of μFIC and ΣFIC are as follows: T , X θ  B−1 K X , X θ  Λ θ, σ2, A μFIC =  K xnew A A A

−1

T ,X θ K X ,X θ ΣFIC = k xnew, xnew θ − K xnew A A A

y − Hβ ,

−1

T K XA, xnew θ

T , X θ B−1K X , xT +K xnew A A A new θ ,

where 6-19

6

Gaussian Processes

BA = K XA, XA θ + K XA, X θ Λ θ, σ2, A

−1

K X, XA θ ,

Λ θ, σ2, A = Ω X θ, A + σ2In .

References [1] Candela, J. Q. A Unifying View of Sparse Approximate Gaussian Process Regression. Journal of Machine Learning Research. Vol 6, pp. 1939–1959, 2005.

See Also fitrgp | predict

More About

6-20



“Gaussian Process Regression Models” on page 6-2



“Exact GPR Method” on page 6-9



“Subset of Data Approximation for GPR Models” on page 6-13



“Subset of Regressors Approximation for GPR Models” on page 6-14



“Block Coordinate Descent Approximation for GPR Models” on page 6-21

Block Coordinate Descent Approximation for GPR Models

Block Coordinate Descent Approximation for GPR Models For a large number of observations, using the exact method for parameter estimation and making predictions on new data can be expensive (see “Exact GPR Method” on page 6-9). One of the approximation methods that help deal with this issue for prediction is the Block Coordinate Descent (BCD) method. You can make predictions using the BCD method by first specifying the predict method using the 'PredictMethod','bcd' name-value pair argument in the call to fitrgp, and then using the predict function. The idea of the BCD method is to compute α = K(X, X) + σ2IN

−1

y − Hβ

in a different way than the exact method. BCD estimates α by solving the following optimization problem: α = arg minα f (α) where f (α) =

1 T α K(X, X) + σ2IN α − αT y − Hβ . 2

fitrgp uses block coordinate descent (BCD) to solve the above optimization problem. For users familiar with support vector machines (SVMs), sequential minimal optimization (SMO) and ISDA (iterative single data algorithm) used to fit SVMs are special cases of BCD. fitrgp performs two steps for each BCD update - block selection and block update. In the block selection phase, fitrgp uses a combination of random and greedy strategies for selecting the set of α coefficients to optimize next. In the block update phase, the selected α coefficients are optimized while keeping the other α coefficients fixed to their previous values. These two steps are repeated until convergence. BCD does not store the n * n matrix K X, X in memory but it just computes chunks of the K X, X matrix as needed. As a result BCD is more memory efficient compared to 'PredictMethod','exact'. Practical experience indicates that it is not necessary to solve the optimization problem for computing α to very high precision. For example, it is reasonable to stop the iterations when ∇ f (α) drops below η ∇ f (α0) , where α0 is the initial value of α and η is a small constant (say −3

η = 10 ). The results from 'PredictMethod','exact' and 'PredictMethod','bcd' are equivalent except BCD computes α in a different way. Hence, 'PredictMethod','bcd' can be thought of as a memory efficient way of doing 'PredictMethod','exact'. BCD is often faster than 'PredictMethod','exact' for large n. However, you cannot compute the prediction intervals using the 'PredictMethod','bcd' option. This is because computing prediction intervals requires solving a problem like the one solved for computing α for every test point, which again is expensive.

Fit GPR Models Using BCD Approximation This example shows fitting a Gaussian Process Regression (GPR) model to data with a large number of observations, using the Block Coordinate Descent (BCD) Approximation. Small n - PredictMethod 'exact' and 'bcd' Produce the Same Results Generate a small sample dataset. 6-21

6

Gaussian Processes

rng(0,'twister'); n = 1000; X = linspace(0,1,n)'; X = [X,X.^2]; y = 1 + X*[1;2] + sin(20*X*[1;-2])./(X(:,1)+1) + 0.2*randn(n,1);

Create a GPR model using 'FitMethod','exact' and 'PredictMethod','exact'. gpr = fitrgp(X,y,'KernelFunction','squaredexponential',... 'FitMethod','exact','PredictMethod','exact');

Create another GPR model using 'FitMethod','exact' and 'PredictMethod','bcd'. gprbcd = fitrgp(X,y,'KernelFunction','squaredexponential',... 'FitMethod','exact','PredictMethod','bcd','BlockSize',200);

'PredictMethod','exact' and 'PredictMethod','bcd' should be equivalent. They compute the same but just in different ways. You can also see the iterations by using the 'Verbose',1 name-value pair argument in the call to fitrgp . Compute the fitted values using the two methods and compare them: ypred = resubPredict(gpr); ypredbcd = resubPredict(gprbcd); max(abs(ypred-ypredbcd)) ans = 5.8853e-04

The fitted values are close to each other. Now, plot the data along with fitted values for 'PredictMethod','exact' and 'PredictMethod','bcd'. figure; plot(y,'k.'); hold on; plot(ypred,'b-','LineWidth',2); plot(ypredbcd,'m--','LineWidth',2); legend('Data','PredictMethod Exact','PredictMethod BCD','Location','Best'); xlabel('Observation index'); ylabel('Response value');

6-22

Block Coordinate Descent Approximation for GPR Models

It can be seen that 'PredictMethod','exact' and 'PredictMethod','bcd' produce nearly identical fits. Large n - Use 'FitMethod','sd' and 'PredictMethod','bcd' Generate a bigger dataset similar to the previous one. rng(0,'twister'); n = 50000; X = linspace(0,1,n)'; X = [X,X.^2]; y = 1 + X*[1;2] + sin(20*X*[1;-2])./(X(:,1)+1) + 0.2*randn(n,1);

For n = 50000, the matrix K(X, X) would be 50000-by-50000. Storing K(X, X) in memory would require around 20 GB of RAM. To fit a GPR model to this dataset, use 'FitMethod','sd' with a random subset of m = 2000 points. The 'ActiveSetSize' name-value pair argument in the call to fitrgp specifies the active set size m. For computing use 'PredictMethod','bcd' with a 'BlockSize' of 5000. The 'BlockSize' name-value pair argument in fitrgp specifies the number of elements of the vector that the software optimizes at each iteration. A 'BlockSize' of 5000 assumes that the computer you use can store a 5000-by-5000 matrix in memory. gprbcd = fitrgp(X,y,'KernelFunction','squaredexponential',..., 'FitMethod','sd','ActiveSetSize',2000,'PredictMethod','bcd','BlockSize',5000);

Plot the data and the fitted values. figure; plot(y,'k.');

6-23

6

Gaussian Processes

hold on; plot(resubPredict(gprbcd),'m-','LineWidth',2); legend('Data','PredictMethod BCD','Location','Best'); xlabel('Observation index'); ylabel('Response value');

The plot is similar to the one for smaller n .

References [1] Grippo, L. and M. Sciandrone. On the convergence of the block nonlinear gauss-seifel method under convex constraints. Operations Research Letters. Vol. 26, pp. 127–136, 2000. [2] Bo, L. and C. Sminchisescu. Greed block coordinate descent for large scale Gaussian process regression. In Proceedings of the Twenty Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008): http://arxiv.org/abs/1206.3238, 2012.

See Also fitrgp | predict

More About

6-24



“Gaussian Process Regression Models” on page 6-2



“Exact GPR Method” on page 6-9

Block Coordinate Descent Approximation for GPR Models



“Subset of Data Approximation for GPR Models” on page 6-13



“Subset of Regressors Approximation for GPR Models” on page 6-14



“Fully Independent Conditional Approximation for GPR Models” on page 6-18

6-25

7 Random Number Generation • “Generating Pseudorandom Numbers” on page 7-2 • “Representing Sampling Distributions Using Markov Chain Samplers” on page 7-10 • “Generating Quasi-Random Numbers” on page 7-13 • “Generating Data Using Flexible Families of Distributions” on page 7-21 • “Bayesian Linear Regression Using Hamiltonian Monte Carlo” on page 7-27

7

Random Number Generation

Generating Pseudorandom Numbers Pseudorandom numbers are generated by deterministic algorithms. They are "random" in the sense that, on average, they pass statistical tests regarding their distribution and correlation. They differ from true random numbers in that they are generated by an algorithm, rather than a truly random process. Random number generators (RNGs) like those in MATLAB are algorithms for generating pseudorandom numbers with a specified distribution. For more information on the GUI for generating random numbers from supported distributions, see “Explore the Random Number Generation UI” on page 5-82.

Common Pseudorandom Number Generation Methods • “Direct Methods” on page 7-2 • “Inversion Methods” on page 7-3 • “Acceptance-Rejection Methods” on page 7-6 Methods for generating pseudorandom numbers usually start with uniform random numbers, like the MATLAB rand function produces. The methods described in this section detail how to produce random numbers from other distributions. Direct Methods Direct methods directly use the definition of the distribution. For example, consider binomial random numbers. A binomial random number is the number of heads in N tosses of a coin with probability p of a heads on any single toss. If you generate N uniform random numbers on the interval (0,1) and count the number less than p, then the count is a binomial random number with parameters N and p. This function is a simple implementation of a binomial RNG using the direct approach: function X = directbinornd(N,p,m,n) X = zeros(m,n); % Preallocate memory for i = 1:m*n u = rand(N,1); X(i) = sum(u < p); end end

For example: rng('default') % For reproducibility X = directbinornd(100,0.3,1e4,1); histogram(X,101)

7-2

Generating Pseudorandom Numbers

The binornd function uses a modified direct method, based on the definition of a binomial random variable as the sum of Bernoulli random variables. You can easily convert the previous method to a random number generator for the Poisson distribution with parameter λ. The “Poisson Distribution” on page B-131 is the limiting case of the binomial distribution as N approaches infinity, p approaches zero, and Np is held fixed at λ. To generate Poisson random numbers, create a version of the previous generator that inputs λ rather than N and p, and internally sets N to some large number and p to λ/N. The poissrnd function actually uses two direct methods: • A waiting time method for small values of λ • A method due to Ahrens and Dieter for larger values of λ Inversion Methods Inversion methods are based on the observation that continuous cumulative distribution functions (cdfs) range uniformly over the interval (0,1). If u is a uniform random number on (0,1), then using X = F−1(U) generates a random number X from a continuous distribution with specified cdf F. For example, the following code generates random numbers from a specific “Exponential Distribution” on page B-33 using the inverse cdf and the MATLAB® uniform random number generator rand:

7-3

7

Random Number Generation

rng('default') % For reproducibility mu = 1; X = expinv(rand(1e4,1),mu);

Compare the distribution of the generated random numbers to the pdf of the specified exponential by scaling the pdf to the area of the histogram used to display the distribution: numbins = 50; h = histogram(X,numbins); hold on histarea = h.BinWidth*sum(h.Values); x = h.BinEdges(1):0.001:h.BinEdges(end); y = exppdf(x,mu); plot(x,histarea*y,'r','LineWidth',2) hold off

Inversion methods also work for discrete distributions. To generate a random number X from a discrete distribution with probability mass vector P(X = xi) = pi where x0 < x1 < x2 < . . ., generate a uniform random number u on (0,1) and then set X = xi if F(xi − 1) < u < F(xi). For example, the following function implements an inversion method for a discrete distribution with probability mass vector p: function X = discreteinvrnd(p,m,n) X = zeros(m,n); % Preallocate memory for i = 1:m*n u = rand;

7-4

Generating Pseudorandom Numbers

I = find(u < cumsum(p)); X(i) = min(I); end end

Use the function to generate random numbers from any discrete distribution: p = [0.1 0.2 0.3 0.2 0.1 0.1]; % Probability mass vector X = discreteinvrnd(p,1e4,1); h = histogram(X,length(p));

bar(1:length(p),h.Values)

7-5

7

Random Number Generation

Acceptance-Rejection Methods The functional form of some distributions makes it difficult or time-consuming to generate random numbers using direct or inversion methods. Acceptance-rejection methods provide an alternative in these cases. Acceptance-rejection methods begin with uniform random numbers, but require an additional random number generator. If your goal is to generate a random number from a continuous distribution with pdf f , acceptance-rejection methods first generate a random number from a continuous distribution with pdf g satisfying f (x) ≤ cg(x) for some c and all x.

7-6

Generating Pseudorandom Numbers

A continuous acceptance-rejection RNG proceeds as follows: 1

Chooses a density g.

2

Finds a constant c such that f (x)/g(x) ≤ c for all x.

3

Generates a uniform random number u.

4

Generates a random number v from g.

5

If cu ≤ f (v)/g(v), accepts and returns v. Otherwise, rejects v and goes to step 3.

For efficiency, a "cheap" method is necessary for generating random numbers from g, and the scalar c should be small. The expected number of iterations to produce a single random number is c. The following function implements an acceptance-rejection method for generating random numbers from pdf f given f , g, the RNG grnd for g, and the constant c: function X = accrejrnd(f,g,grnd,c,m,n) X = zeros(m,n); % Preallocate memory for i = 1:m*n accept = false; while accept == false u = rand(); v = grnd(); if c*u 1.15 — State average was greater than $1.15 (right-tail test) H1: µ < 1.15 — State average was less than $1.15 (left-tail test) • To conduct a hypothesis test, a random sample from the population is collected and a relevant test statistic is computed to summarize the sample. This statistic varies with the type of test, but its distribution under the null hypothesis must be known (or assumed). • The p value of a test is the probability, under the null hypothesis, of obtaining a value of the test statistic as extreme or more extreme than the value computed from the sample. • The significance level of a test is a threshold of probability α agreed to before the test is conducted. A typical value of α is 0.05. If the p value of a test is less than α, the test rejects the null hypothesis. If the p value is greater than α, there is insufficient evidence to reject the null hypothesis. Note that lack of evidence for rejecting the null hypothesis is not evidence for accepting the null hypothesis. Also note that substantive “significance” of an alternative cannot be inferred from the statistical significance of a test. • The significance level α can be interpreted as the probability of rejecting the null hypothesis when it is actually true—a type I error. The distribution of the test statistic under the null hypothesis determines the probability α of a type I error. Even if the null hypothesis is not rejected, it may still be false—a type II error. The distribution of the test statistic under the alternative hypothesis determines the probability β of a type II error. Type II errors are often due to small sample sizes. The power of a test, 1 – β, is the probability of correctly rejecting a false null hypothesis. • Results of hypothesis tests are often communicated with a confidence interval. A confidence interval is an estimated range of values with a specified probability of containing the true population value of a parameter. Upper and lower bounds for confidence intervals are computed from the sample estimate of the parameter and the known (or assumed) sampling distribution of the estimator. A typical assumption is that estimates will be normally distributed with repeated sampling (as dictated by the Central Limit Theorem). Wider confidence intervals correspond to poor estimates (smaller samples); narrow intervals correspond to better estimates (larger samples). If the null hypothesis asserts the value of a population parameter, the test rejects the null hypothesis when the hypothesized value lies outside the computed confidence interval for the parameter.

8-2

Hypothesis Test Terminology

See Also More About •

“Hypothesis Testing” on page 8-5



“Hypothesis Test Assumptions” on page 8-4



“Available Hypothesis Tests” on page 8-10

8-3

8

Hypothesis Tests

Hypothesis Test Assumptions Different hypothesis tests make different assumptions about the distribution of the random variable being sampled in the data. These assumptions must be considered when choosing a test and when interpreting the results. For example, the z-test (ztest) and the t-test (ttest) both assume that the data are independently sampled from a normal distribution. Statistics and Machine Learning Toolbox functions are available for testing this assumption, such as chi2gof, jbtest, lillietest, and normplot. Both the z-test and the t-test are relatively robust with respect to departures from this assumption, so long as the sample size n is large enough. Both tests compute a sample mean x , which, by the Central Limit Theorem, has an approximately normal sampling distribution with mean equal to the population mean μ, regardless of the population distribution being sampled. The difference between the z-test and the t-test is in the assumption of the standard deviation σ of the underlying normal distribution. A z-test assumes that σ is known; a t-test does not. As a result, a t-test must compute an estimate s of the standard deviation from the sample. Test statistics for the z-test and the t-test are, respectively, x−μ σ/ n x−μ t= s/ n z=

Under the null hypothesis that the population is distributed with mean μ, the z-statistic has a standard normal distribution, N(0,1). Under the same null hypothesis, the t-statistic has Student's t distribution with n – 1 degrees of freedom. For small sample sizes, Student's t distribution is flatter and wider than N(0,1), compensating for the decreased confidence in the estimate s. As sample size increases, however, Student's t distribution approaches the standard normal distribution, and the two tests become essentially equivalent. Knowing the distribution of the test statistic under the null hypothesis allows for accurate calculation of p-values. Interpreting p-values in the context of the test assumptions allows for critical analysis of test results. Assumptions underlying Statistics and Machine Learning Toolbox hypothesis tests are given in the reference pages for implementing functions.

See Also More About

8-4



“Hypothesis Testing” on page 8-5



“Hypothesis Test Terminology” on page 8-2



“Available Hypothesis Tests” on page 8-10

Hypothesis Testing

Hypothesis Testing Hypothesis testing is a common method of drawing inferences about a population based on statistical evidence from a sample. As an example, suppose someone says that at a certain time in the state of Massachusetts the average price of a gallon of regular unleaded gas was $1.15. How could you determine the truth of the statement? You could try to find prices at every gas station in the state at the time. That approach would be definitive, but it could be time-consuming, costly, or even impossible. A simpler approach would be to find prices at a small number of randomly selected gas stations around the state, and then compute the sample average. Sample averages differ from one another due to chance variability in the selection process. Suppose your sample average comes out to be $1.18. Is the $0.03 difference an artifact of random sampling or significant evidence that the average price of a gallon of gas was in fact greater than $1.15? Hypothesis testing is a statistical method for making such decisions. This example shows how to use hypothesis testing to analyze gas prices measured across the state of Massachusetts during two separate months. This example uses the gas price data in the file gas.mat. The file contains two random samples of prices for a gallon of gas around the state of Massachusetts in 1993. The first sample, price1, contains 20 random observations around the state on a single day in January. The second sample, price2, contains 20 random observations around the state one month later. load gas prices = [price1 price2];

As a first step, you might want to test the assumption that the samples come from normal distributions. A normal probability plot gives a quick idea. normplot(prices)

8-5

8

Hypothesis Tests

Both scatters approximately follow straight lines through the first and third quartiles of the samples, indicating approximate normal distributions. The February sample (the right-hand line) shows a slight departure from normality in the lower tail. A shift in the mean from January to February is evident. A hypothesis test is used to quantify the test of normality. Since each sample is relatively small, a Lilliefors test is recommended. lillietest(price1) ans = 0 lillietest(price2) ans = 0

The default significance level of lillietest is 5%. The logical 0 returned by each test indicates a failure to reject the null hypothesis that the samples are normally distributed. This failure may reflect normality in the population or it may reflect a lack of strong evidence against the null hypothesis due to the small sample size. Now compute the sample means. sample_means = mean(prices) sample_means = 1×2 115.1500

8-6

118.5000

Hypothesis Testing

You might want to test the null hypothesis that the mean price across the state on the day of the January sample was $1.15. If you know that the standard deviation in prices across the state has historically, and consistently, been $0.04, then a z-test is appropriate. [h,pvalue,ci] = ztest(price1/100,1.15,0.04) h = 0 pvalue = 0.8668 ci = 2×1 1.1340 1.1690

The logical output h = 0 indicates a failure to reject the null hypothesis at the default significance level of 5%. This is a consequence of the high probability under the null hypothesis, indicated by the p value, of observing a value as extreme or more extreme of the z-statistic computed from the sample. The 95% confidence interval on the mean [1.1340 1.1690] includes the hypothesized population mean of $1.15. Does the later sample offer stronger evidence for rejecting a null hypothesis of a state-wide average price of $1.15 in February? The shift shown in the probability plot and the difference in the computed sample means suggest this. The shift might indicate a significant fluctuation in the market, raising questions about the validity of using the historical standard deviation. If a known standard deviation cannot be assumed, a t-test is more appropriate. [h,pvalue,ci] = ttest(price2/100,1.15) h = 1 pvalue = 4.9517e-04 ci = 2×1 1.1675 1.2025

The logical output h = 1 indicates a rejection of the null hypothesis at the default significance level of 5%. In this case, the 95% confidence interval on the mean does not include the hypothesized population mean of $1.15. You might want to investigate the shift in prices a little more closely. The function ttest2 tests if two independent samples come from normal distributions with equal but unknown standard deviations and the same mean, against the alternative that the means are unequal. [h,sig,ci] = ttest2(price1,price2) h = 1 sig = 0.0083 ci = 2×1 -5.7845 -0.9155

8-7

8

Hypothesis Tests

The null hypothesis is rejected at the default 5% significance level, and the confidence interval on the difference of means does not include the hypothesized value of 0. A notched box plot is another way to visualize the shift. boxplot(prices,1) h = gca; h.XTick = [1 2]; h.XTickLabel = {'January','February'}; xlabel('Month') ylabel('Prices ($0.01)')

The plot displays the distribution of the samples around their medians. The heights of the notches in each box are computed so that the side-by-side boxes have nonoverlapping notches when their medians are different at a default 5% significance level. The computation is based on an assumption of normality in the data, but the comparison is reasonably robust for other distributions. The side-byside plots provide a kind of visual hypothesis test, comparing medians rather than means. The plot above appears to barely reject the null hypothesis of equal medians. The nonparametric Wilcoxon rank sum test, implemented by the function ranksum, can be used to quantify the test of equal medians. It tests if two independent samples come from identical continuous (not necessarily normal) distributions with equal medians, against the alternative that they do not have equal medians. [p,h] = ranksum(price1,price2) p = 0.0095

8-8

Hypothesis Testing

h = logical 1

The test rejects the null hypothesis of equal medians at the default 5% significance level.

See Also boxplot | lillietest | normplot | ttest | ttest2 | ztest

More About •

“Hypothesis Test Terminology” on page 8-2



“Hypothesis Test Assumptions” on page 8-4



“Available Hypothesis Tests” on page 8-10



“Distribution Plots” on page 4-6

8-9

8

Hypothesis Tests

Available Hypothesis Tests

8-10

Function

Description

ansaribradley

Ansari-Bradley test. Tests if two independent samples come from the same distribution, against the alternative that they come from distributions that have the same median and shape but different variances.

barttest

Bartlett’s test. Tests if the variances of the data values along each principal component are equal, against the alternative that the variances are not all equal.

chi2gof

Chi-square goodness-of-fit test. Tests if a sample comes from a specified distribution, against the alternative that it does not come from that distribution.

dwtest

Durbin-Watson test. Tests if the residuals from a linear regression are uncorrelated, against the alternative that there is autocorrelation among them.

friedman

Friedman’s test. Tests if the column effects in a two-way layout are all the same, against the alternative that the column effects are not all the same.

jbtest

Jarque-Bera test. Tests if a sample comes from a normal distribution with unknown mean and variance, against the alternative that it does not come from a normal distribution.

kruskalwallis

Kruskal-Wallis test. Tests if multiple samples are all drawn from the same populations (or equivalently, from different populations with the same distribution), against the alternative that they are not all drawn from the same population.

kstest

One-sample Kolmogorov-Smirnov test. Tests if a sample comes from a continuous distribution with specified parameters, against the alternative that it does not come from that distribution.

kstest2

Two-sample Kolmogorov-Smirnov test. Tests if two samples come from the same continuous distribution, against the alternative that they do not come from the same distribution.

lillietest

Lilliefors test. Tests if a sample comes from a distribution in the normal family, against the alternative that it does not come from a normal distribution.

linhyptest

Linear hypothesis test. Tests if H*b = c for parameter estimates b with estimated covariance H and specified c, against the alternative that H*b ≠ c.

ranksum

Wilcoxon rank sum test. Tests if two independent samples come from identical continuous distributions with equal medians, against the alternative that they do not have equal medians.

runstest

Runs test. Tests if a sequence of values comes in random order, against the alternative that the ordering is not random.

signrank

One-sample or paired-sample Wilcoxon signed rank test. Tests if a sample comes from a continuous distribution symmetric about a specified median, against the alternative that it does not have that median.

Available Hypothesis Tests

Function

Description

signtest

One-sample or paired-sample sign test. Tests if a sample comes from an arbitrary continuous distribution with a specified median, against the alternative that it does not have that median.

ttest

One-sample or paired-sample t-test. Tests if a sample comes from a normal distribution with unknown variance and a specified mean, against the alternative that it does not have that mean.

ttest2

Two-sample t-test. Tests if two independent samples come from normal distributions with unknown but equal (or, optionally, unequal) variances and the same mean, against the alternative that the means are unequal.

vartest

One-sample chi-square variance test. Tests if a sample comes from a normal distribution with specified variance, against the alternative that it comes from a normal distribution with a different variance.

vartest2

Two-sample F-test for equal variances. Tests if two independent samples come from normal distributions with the same variance, against the alternative that they come from normal distributions with different variances.

vartestn

Bartlett multiple-sample test for equal variances. Tests if multiple samples come from normal distributions with the same variance, against the alternative that they come from normal distributions with different variances.

ztest

One-sample z-test. Tests if a sample comes from a normal distribution with known variance and specified mean, against the alternative that it does not have that mean.

Note In addition to the previous functions, Statistics and Machine Learning Toolbox functions are available for analysis of variance (ANOVA), which perform hypothesis tests in the context of linear modeling.

See Also More About •

“Hypothesis Testing” on page 8-5



“Hypothesis Test Terminology” on page 8-2



“Hypothesis Test Assumptions” on page 8-4

8-11

9 Analysis of Variance • “Introduction to Analysis of Variance” on page 9-2 • “One-Way ANOVA” on page 9-3 • “Two-Way ANOVA” on page 9-11 • “Multiple Comparisons” on page 9-18 • “N-Way ANOVA” on page 9-25 • “ANOVA with Random Effects” on page 9-33 • “Other ANOVA Models” on page 9-38 • “Analysis of Covariance” on page 9-39 • “Nonparametric Methods” on page 9-47 • “MANOVA” on page 9-49 • “Model Specification for Repeated Measures Models” on page 9-54 • “Compound Symmetry Assumption and Epsilon Corrections” on page 9-55 • “Mauchly’s Test of Sphericity” on page 9-57 • “Multivariate Analysis of Variance for Repeated Measures” on page 9-59

9

Analysis of Variance

Introduction to Analysis of Variance Analysis of variance (ANOVA) is a procedure for assigning sample variance to different sources and deciding whether the variation arises within or among different population groups. Samples are described in terms of variation around group means and variation of group means around an overall mean. If variations within groups are small relative to variations between groups, a difference in group means may be inferred. Hypothesis tests are used to quantify decisions.

9-2

One-Way ANOVA

One-Way ANOVA In this section... “Introduction to One-Way ANOVA” on page 9-3 “Prepare Data for One-Way ANOVA” on page 9-4 “Perform One-Way ANOVA” on page 9-5 “Mathematical Details” on page 9-9

Introduction to One-Way ANOVA You can use the Statistics and Machine Learning Toolbox function anova1 to perform one-way analysis of variance (ANOVA). The purpose of one-way ANOVA is to determine whether data from several groups (levels) of a factor have a common mean. That is, one-way ANOVA enables you to find out whether different groups of an independent variable have different effects on the response variable y. Suppose, a hospital wants to determine if the two new proposed scheduling methods reduce patient wait times more than the old way of scheduling appointments. In this case, the independent variable is the scheduling method, and the response variable is the waiting time of the patients. One-way ANOVA is a simple special case of the linear model on page 11-5. The one-way ANOVA form of the model is yi j = α j + εi j with the following assumptions: • yij is an observation, in which i represents the observation number, and j represents a different group (level) of the predictor variable y. All yij are independent. • αj represents the population mean for the jth group (level or treatment). • εij is the random error, independent and normally distributed, with zero mean and constant variance, i.e., εij ~ N(0,σ2). This model is also called the means model. The model assumes that the columns of y are the constant αj plus the error component εij. ANOVA helps determine if the constants are all the same. ANOVA tests the hypothesis that all group means are equal versus the alternative hypothesis that at least one group is different from the others. H0 : α1 = α2 = ... = αk H1 : not all group means are equal anova1(y) tests the equality of column means for the data in matrix y, where each column is a different group and has the same number of observations (i.e., a balanced design). anova1(y,group) tests the equality of group means, specified in group, for the data in vector or matrix y. In this case, each group or column can have a different number of observations (i.e., an unbalanced design). ANOVA is based on the assumption that all sample populations are normally distributed. It is known to be robust to modest violations of this assumption. You can check the normality assumption visually by using a normality plot (normplot). Alternatively, you can use one of the Statistics and Machine 9-3

9

Analysis of Variance

Learning Toolbox functions that checks for normality: the Anderson-Darling test (adtest), the chisquared goodness of fit test (chi2gof), the Jarque-Bera test (jbtest), or the Lilliefors test (lillietest).

Prepare Data for One-Way ANOVA You can provide sample data as a vector or a matrix. • If the sample data is in a vector, y, then you must provide grouping information using the group input variable: anova1(y,group). group must be a numeric vector, logical vector, categorical vector, character array, string array, or cell array of character vectors, with one name for each element of y. The anova1 function treats the y values corresponding to the same value of group as part of the same group. For example,

Use this design when groups have different numbers of elements (unbalanced ANOVA). • If the sample data is in a matrix, y, providing the group information is optional. • If you do not specify the input variable group, then anova1 treats each column of y as a separate group, and evaluates whether the population means of the columns are equal. For example,

Use this form of design when each group has the same number of elements (balanced ANOVA). • If you specify the input variable group, then each element in group represents a group name for the corresponding column in y. The anova1 function treats the columns with the same group name as part of the same group. For example,

9-4

One-Way ANOVA

anova1 ignores any NaN values in y. Also, if group contains empty or NaN values, anova1 ignores the corresponding observations in y. The anova1 function performs balanced ANOVA if each group has the same number of observations after the function disregards empty or NaN values. Otherwise, anova1 performs unbalanced ANOVA.

Perform One-Way ANOVA This example shows how to perform one-way ANOVA to determine whether data from several groups have a common mean. Load and display the sample data. load hogg hogg hogg = 6×5 24 15 21 27 33 23

14 7 12 17 14 16

11 9 7 13 12 18

7 7 4 7 12 18

19 24 19 15 10 20

The data comes from a Hogg and Ledolter (1987) study on bacteria counts in shipments of milk. The columns of the matrix hogg represent different shipments. The rows are bacteria counts from cartons of milk chosen randomly from each shipment. Test if some shipments have higher counts than others. By default, anova1 returns two figures. One is the standard ANOVA table, and the other one is the box plots of data by group. [p,tbl,stats] = anova1(hogg);

9-5

9

Analysis of Variance

p p = 1.1971e-04

The small p-value of about 0.0001 indicates that the bacteria counts from the different shipments are not the same. You can get some graphical assurance that the means are different by looking at the box plots. The notches, however, compare the medians, not the means. For more information on this display, see boxplot. View the standard ANOVA table. anova1 saves the standard ANOVA table as a cell array in the output argument tbl. tbl tbl=4×6 cell array Columns 1 through 5

9-6

One-Way ANOVA

{'Source' } {'Columns'} {'Error' } {'Total' }

{'SS' } {[ 803.0000]} {[ 557.1667]} {[1.3602e+03]}

{'df'} {[ 4]} {[25]} {[29]}

{'MS' } {[200.7500]} {[ 22.2867]} {0x0 double}

{'F' } {[ 9.0076]} {0x0 double} {0x0 double}

Column 6 {'Prob>F' } {[1.1971e-04]} {0x0 double } {0x0 double }

Save the F-statistic value in the variable Fstat. Fstat = tbl{2,5} Fstat = 9.0076

View the statistics necessary to make a multiple pairwise comparison of group means. anova1 saves these statistics in the structure stats. stats stats = struct with fields: gnames: [5x1 char] n: [6 6 6 6 6] source: 'anova1' means: [23.8333 13.3333 11.6667 9.1667 17.8333] df: 25 s: 4.7209

ANOVA rejects the null hypothesis that all group means are equal, so you can use the multiple comparisons to determine which group means are different from others. To conduct multiple comparison tests, use the function multcompare, which accepts stats as an input argument. In this example, anova1 rejects the null hypothesis that the mean bacteria counts from all four shipments are equal to each other, i.e., H0 : μ1 = μ2 = μ3 = μ4. Perform a multiple comparison test to determine which shipments are different than the others in terms of mean bacteria counts. multcompare(stats)

9-7

9

Analysis of Variance

ans = 10×6 1.0000 1.0000 1.0000 1.0000 2.0000 2.0000 2.0000 3.0000 3.0000 4.0000

2.0000 3.0000 4.0000 5.0000 3.0000 4.0000 5.0000 4.0000 5.0000 5.0000

2.4953 4.1619 6.6619 -2.0047 -6.3381 -3.8381 -12.5047 -5.5047 -14.1714 -16.6714

10.5000 12.1667 14.6667 6.0000 1.6667 4.1667 -4.5000 2.5000 -6.1667 -8.6667

18.5047 20.1714 22.6714 14.0047 9.6714 12.1714 3.5047 10.5047 1.8381 -0.6619

0.0059 0.0013 0.0001 0.2119 0.9719 0.5544 0.4806 0.8876 0.1905 0.0292

The first two columns show which group means are compared with each other. For example, the first row compares the means for groups 1 and 2. The last column shows the p-values for the tests. The pvalues 0.0059, 0.0013, and 0.0001 indicate that the mean bacteria counts in the milk from the first shipment is different from the ones from the second, third, and fourth shipments. The p-value of 0.0292 indicates that the mean bacteria counts in the milk from the fourth shipment is different from the ones from the fifth. The procedure fails to reject the hypotheses that the other group means are different from each other. The figure also illustrates the same result. The blue bar shows the comparison interval for the first group mean, which does not overlap with the comparison intervals for the second, third, and fourth group means, shown in red. The comparison interval for the mean of fifth group, shown in gray,

9-8

One-Way ANOVA

overlaps with the comparison interval for the first group mean. Hence, the group means for the first and fifth groups are not significantly different from each other.

Mathematical Details ANOVA tests for the difference in the group means by partitioning the total variation in the data into two components: • Variation of group means from the overall mean, i.e., y . j − y .. (variation between groups), where y . j is the sample mean of group j, and y .. is the overall sample mean. • Variation of observations in each group from their group mean estimates, yi j − y . j (variation within group). In other words, ANOVA partitions the total sum of squares (SST) into sum of squares due to betweengroups effect (SSR) and sum of squared errors(SSE).

∑∑

⚬i

j

yi j − y ..

2

=

∑ n j y . j − y .. 2 + ∑ ∑

⚬j

SST

SSR

⚬i

j

2

yi j − y . j , SSE

where nj is the sample size for the jth group, j = 1, 2, ..., k. Then ANOVA compares the variation between groups to the variation within groups. If the ratio of within-group variation to between-group variation is significantly high, then you can conclude that the group means are significantly different from each other. You can measure this using a test statistic that has an F-distribution with (k – 1, N – k) degrees of freedom: F=

SSR k − 1 SSE N − k

=

MSR Fk − 1, N − k, MSE

where MSR is the mean squared treatment, MSE is the mean squared error, k is the number of groups, and N is the total number of observations. If the p-value for the F-statistic is smaller than the significance level, then the test rejects the null hypothesis that all group means are equal and concludes that at least one of the group means is different from the others. The most common significance levels are 0.05 and 0.01. ANOVA Table The ANOVA table captures the variability in the model by source, the F-statistic for testing the significance of this variability, and the p-value for deciding on the significance of this variability. The p-value returned by anova1 depends on assumptions about the random disturbances εij in the model equation. For the p-value to be correct, these disturbances need to be independent, normally distributed, and have constant variance. The standard ANOVA table has this form:

anova1 returns the standard ANOVA table as a cell array with six columns. 9-9

9

Analysis of Variance

Column

Definition

Source

Source of the variability.

SS

Sum of squares due to each source.

df

Degrees of freedom associated with each source. Suppose N is the total number of observations and k is the number of groups. Then, N – k is the within-groups degrees of freedom (Error), k – 1 is the between-groups degrees of freedom (Columns), and N – 1 is the total degrees of freedom: N – 1 = (N – k) + (k – 1).

MS

Mean squares for each source, which is the ratio SS/df.

F

F-statistic, which is the ratio of the mean squares.

Prob>F

p-value, which is the probability that the Fstatistic can take a value larger than the computed test-statistic value. anova1 derives this probability from the cdf of the F-distribution.

The rows of the ANOVA table show the variability in the data, divided by the source. Row (Source)

Definition

Groups or Columns

Variability due to the differences among the group means (variability between groups)

Error

Variability due to the differences between the data in each group and the group mean (variability within groups)

Total

Total variability

References [1] Wu, C. F. J., and M. Hamada. Experiments: Planning, Analysis, and Parameter Design Optimization, 2000. [2] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. 4th ed. Applied Linear Statistical Models. Irwin Press, 1996.

See Also anova1 | kruskalwallis | multcompare

More About

9-10



“Two-Way ANOVA” on page 9-11



“N-Way ANOVA” on page 9-25



“Multiple Comparisons” on page 9-18



“Nonparametric Methods” on page 9-47

Two-Way ANOVA

Two-Way ANOVA In this section... “Introduction to Two-Way ANOVA” on page 9-11 “Prepare Data for Balanced Two-Way ANOVA” on page 9-12 “Perform Two-Way ANOVA” on page 9-13 “Mathematical Details” on page 9-15

Introduction to Two-Way ANOVA You can use the Statistics and Machine Learning Toolbox function anova2 to perform a balanced twoway analysis of variance (ANOVA). To perform two-way ANOVA for an unbalanced design, use anovan. For an example, see “Two-Way ANOVA for Unbalanced Design” on page 32-74. As in one-way ANOVA, the data for a two-way ANOVA study can be experimental or observational. The difference between one-way and two-way ANOVA is that in two-way ANOVA, the effects of two factors on a response variable are of interest. These two factors can be independent, and have no interaction effect, or the impact of one factor on the response variable can depend on the group (level) of the other factor. If the two factors have no interactions, the model is called an additive model. Suppose an automobile company has two factories, and each factory makes the same three car models. The gas mileage in the cars can vary from factory to factory and from model to model. These two factors, factory and model, explain the differences in mileage, that is, the response. One measure of interest is the difference in mileage due to the production methods between factories. Another measure of interest is the difference in the mileage of the models (irrespective of the factory) due to different design specifications. The effects of these measures of interest are additive. In addition, suppose only one model has different gas mileage between factories, while the mileage of the other two models is the same between factories. This is called an interaction effect. To measure an interaction effect, there must be multiple observations for some combination of factory and car model. These multiple observations are called replications. Two-way ANOVA is a special case of the linear model on page 11-5. The two-way ANOVA form of the model is yi jr = μ + αi + β j + αβ

i j + εi jr

where, • yijr is an observation of the response variable. • i represents group i of row factor A, i = 1, 2, ..., I • j represents group j of column factor B, j = 1, 2, ..., J • r represents the replication number, r = 1, 2, ..., R There are a total of N = I*J*R observations. • μ is the overall mean. • αi are the deviations of groups of row factor A from the overall mean μ due to row factor B. The values of αi sum to 0. 9-11

9

Analysis of Variance

∑iI = 1 αi = 0. • βj are the deviations of groups in column factor B from the overall mean μ due to row factor B. All values in a given column of βj are identical, and the values of βj sum to 0. J

∑ j = 1 β j = 0. • αβij are the interactions. The values in each row and in each column of αβij sum to 0.

∑iI = 1

αβ

ij

=

J

∑j = 1

αβ

ij

= 0.

• εijr are the random disturbances. They are assumed to be independent, normally distributed, and have constant variance. In the mileage example: • yijr are the gas mileage observations, μ is the overall mean gas mileage. • αi are the deviations of each car's gas mileage from the mean gas mileage μ due to the car's model. • βj are the deviations of each car's gas mileage from the mean gas mileage μ due to the car's factory. anova2 requires that data be balanced, so each combination of model and factory must have the same number of cars. Two-way ANOVA tests hypotheses about the effects of factors A and B, and their interaction on the response variable y. The hypotheses about the equality of the mean response for groups of row factor A are H0 : α1 = α2⋯ = αI H1 :  at least one αi is different,  i = 1,  2,  ...,  I . The hypotheses about the equality of the mean response for groups of column factor B are H0 : β1 = β2 = ⋯ = β J H1 :  at least one β j is different,   j = 1,  2,  ...,   J . The hypotheses about the interaction of the column and row factors are H0 : αβ

ij

=0

H1 : at least one  αβ

ij

≠0

Prepare Data for Balanced Two-Way ANOVA To perform balanced two-way ANOVA using anova2, you must arrange data in a specific matrix form. The columns of the matrix must correspond to groups of the column factor, B. The rows must correspond to the groups of the row factor, A, with the same number of replications for each combination of the groups of factors A and B. Suppose that row factor A has three groups, and column factor B has two groups (levels). Also suppose that each combination of factors A and B has two measurements or observations (reps = 2). Then, each group of factor A has six observations and each group of factor B four observations. 9-12

Two-Way ANOVA

B=1 B=2 y111 y121 y112 y122

A=1 A=2 A=3

y211 y221 y212 y222 y311 y321 y312 y322

The subscripts indicate row, column, and replication, respectively. For example, y221 corresponds to the measurement for the second group of factor A, the second group of factor B, and the first replication for this combination.

Perform Two-Way ANOVA This example shows how to perform two-way ANOVA to determine the effect of car model and factory on the mileage rating of cars. Load and display the sample data. load mileage mileage mileage = 6×3 33.3000 33.4000 32.9000 32.6000 32.5000 33.0000

34.5000 34.8000 33.8000 33.4000 33.7000 33.9000

37.4000 36.8000 37.6000 36.6000 37.0000 36.7000

There are three car models (columns) and two factories (rows). The data has six mileage rows because each factory provided three cars of each model for the study (i.e., the replication number is three). The data from the first factory is in the first three rows, and the data from the second factory is in the last three rows. Perform two-way ANOVA. Return the structure of statistics, stats, to use in multiple comparisons. nmbcars = 3; % Number of cars from each model, i.e., number of replications [~,~,stats] = anova2(mileage,nmbcars);

9-13

9

Analysis of Variance

You can use the F-statistics to do hypotheses tests to find out if the mileage is the same across models, factories, and model - factory pairs. Before performing these tests, you must adjust for the additive effects. anova2 returns the p-value from these tests. The p-value for the model effect (Columns) is zero to four decimal places. This result is a strong indication that the mileage varies from one model to another. The p-value for the factory effect (Rows) is 0.0039, which is also highly significant. This value indicates that one factory is out-performing the other in the gas mileage of the cars it produces. The observed p-value indicates that an F-statistic as extreme as the observed F occurs by chance about four out of 1000 times, if the gas mileage were truly equal from factory to factory. The factories and models appear to have no interaction. The p-value, 0.8411, means that the observed result is likely (84 out of 100 times), given that there is no interaction. Perform “Multiple Comparisons” on page 9-18 to find out which pair of the three car models is significantly different. c = multcompare(stats) Note: Your model includes an interaction term. A test of main effects can be difficult to interpret when the model includes interactions.

c = 3×6 1.0000 1.0000 2.0000

2.0000 3.0000 3.0000

-1.5865 -4.5865 -3.5198

-1.0667 -4.0667 -3.0000

-0.5469 -3.5469 -2.4802

0.0004 0.0000 0.0000

In the matrix c, the first two columns show the pairs of car models that are compared. The last column shows the p-values for the test. All p-values are small (0.0004, 0, and 0), which indicates that the mean mileage of all car models are significantly different from each other. In the figure, the blue bar is the comparison interval for the mean mileage of the first car model. The red bars are the comparison intervals for the mean mileage of the second and third car models. None of the second and third comparison intervals overlap with the first comparison interval, indicating that the mean mileage of the first car model is different from the mean mileage of the second and the third car models. If you click on one of the other bars, you can test for the other car models. None of the comparison intervals overlap, indicating that the mean mileage of each car model is significantly different from the other two.

9-14

Two-Way ANOVA

Mathematical Details The two-factor ANOVA partitions the total variation into the following components: • Variation of row factor group means from the overall mean, y i.. − y ... • Variation of column factor group means from the overall mean, y . j . − y ... • Variation of overall mean plus the replication mean from the column factor group mean plus row factor group mean, y i j . − y i.. − y . j . + y ... • Variation of observations from the replication means, yi jk − y i j . ANOVA partitions the total sum of squares (SST) into the sum of squares due to row factor A (SSA), the sum of squares due to column factor B (SSB), the sum of squares due to interaction between A and B (SSAB), and the sum of squares error (SSE). m

k

R

∑ ∑ ∑

⚬i = 1 j = 1 r = 1

yi jk − y ...

m

= kR ∑ y i.. − y ... ⚬ i=1

2

SSB

SST m

2

k

+R ∑ ∑ y i j . − y i.. − y . j . + y ... ⚬i=1 j=1

2

k

+ mR ∑ y . j . − y ... ⚬ j=1 SS A

m

k

R

∑ ∑ ∑

+

SS AB

2

⚬i = 1 j = 1 r = 1

yi jk − y i j .

2

SSE

ANOVA takes the variation due to the factor or interaction and compares it to the variation due to error. If the ratio of the two variations is high, then the effect of the factor or the interaction effect is statistically significant. You can measure the statistical significance using a test statistic that has an F-distribution. For the null hypothesis that the mean response for groups of the row factor A are equal, the test statistic is F=

SSB

m−1 SSE mk R − 1

∼ Fm − 1, mk R − 1 .

For the null hypothesis that the mean response for groups of the column factor B are equal, the test statistic is F=

SS A

k−1 SSE mk R − 1

∼ Fk − 1, mk R − 1 .

For the null hypothesis that the interaction of the column and row factors are equal to zero, the test statistic is F=

SS AB

m−1 k−1 SSE mk R − 1

∼ F m−1

k − 1 , mk R − 1

.

If the p-value for the F-statistic is smaller than the significance level, then ANOVA rejects the null hypothesis. The most common significance levels are 0.01 and 0.05. ANOVA Table The ANOVA table captures the variability in the model by the source, the F-statistic for testing the significance of this variability, and the p-value for deciding on the significance of this variability. The 9-15

9

Analysis of Variance

p-value returned by anova2 depends on assumptions about the random disturbances, εij, in the model equation. For the p-value to be correct, these disturbances need to be independent, normally distributed, and have constant variance. The standard ANOVA table has this form:

anova2 returns the standard ANOVA table as a cell array with six columns. Column

Definition

Source

The source of the variability.

SS

The sum of squares due to each source.

df

The degrees of freedom associated with each source. Suppose J is the number of groups in the column factor, I is the number of groups in the row factor, and R is the number of replications. Then, the total number of observations is IJR and the total degrees of freedom is IJR – 1. I – 1 is the degrees of freedom for the row factor,J – 1 is the degrees of freedom for the column factor, (I – 1)(J – 1) is the interaction degrees of freedom, and IJ(R – 1) is the error degrees of freedom.

MS

The mean squares for each source, which is the ratio SS/df.

F

F-statistic, which is the ratio of the mean squares.

Prob>F

The p-value, which is the probability that the Fstatistic can take a value larger than the computed test-statistic value. anova2 derives this probability from the cdf of the F-distribution.

The rows of the ANOVA table show the variability in the data that is divided by the source.

9-16

Row (Source)

Definition

Columns

Variability due to the column factor

Rows

Variability due to the row factor

Interaction

Variability due to the interaction of the row and column factors

Error

Variability due to the differences between the data in each group and the group mean (variability within groups)

Two-Way ANOVA

Row (Source)

Definition

Total

Total variability

References [1] Wu, C. F. J., and M. Hamada. Experiments: Planning, Analysis, and Parameter Design Optimization, 2000. [2] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. 4th ed. Applied Linear Statistical Models. Irwin Press, 1996.

See Also anova1 | anova2 | anovan | multcompare

Related Examples •

“Two-Way ANOVA for Unbalanced Design” on page 32-74

More About •

“One-Way ANOVA” on page 9-3



“N-Way ANOVA” on page 9-25



“Multiple Comparisons” on page 9-18



“Nonparametric Methods” on page 9-47

9-17

9

Analysis of Variance

Multiple Comparisons In this section... “Introduction” on page 9-18 “Multiple Comparisons Using One-Way ANOVA” on page 9-18 “Multiple Comparisons for Three-Way ANOVA” on page 9-20 “Multiple Comparison Procedures” on page 9-22

Introduction Analysis of variance (ANOVA) techniques test whether a set of group means (treatment effects) are equal or not. Rejection of the null hypothesis leads to the conclusion that not all group means are the same. This result, however, does not provide further information on which group means are different. Performing a series of t-tests to determine which pairs of means are significantly different is not recommended. When you perform multiple t-tests, the probability that the means appear significant, and significant difference results might be due to large number of tests. These t-tests use the data from the same sample, hence they are not independent. This fact makes it more difficult to quantify the level of significance for multiple tests. Suppose that in a single t-test, the probability that the null hypothesis (H0) is rejected when it is actually true is a small value, say 0.05. Suppose also that you conduct six independent t-tests. If the significance level for each test is 0.05, then the probability that the tests correctly fail to reject H0, when H0 is true for each case, is (0.95)6 = 0.735. And the probability that one of the tests incorrectly rejects the null hypothesis is 1 – 0.735 = 0.265, which is much higher than 0.05. To compensate for multiple tests, you can use multiple comparison procedures. The Statistics and Machine Learning Toolbox function multcompare performs multiple pairwise comparison of the group means, or treatment effects. The options are Tukey’s honestly significant difference criterion (default option), the Bonferroni method, Scheffe’s procedure, Fisher’s least significant differences (lsd) method, and Dunn & Sidak’s approach to t-test. To perform multiple comparisons of group means, provide the structure stats as an input for multcompare. You can obtain stats from one of the following functions : • anova1 — One-way ANOVA on page 9-3 • anova2 — Two-way ANOVA on page 9-11 • anovan — N-way ANOVA on page 9-25 • aoctool — Interactive ANCOVA on page 9-39 • kruskalwallis — Nonparametric method on page 9-47 for one-way layout • friedman — Nonparametric method on page 9-47 for two-way layout For multiple comparison procedure options for repeated measures, see multcompare (RepeatedMeasuresModel).

Multiple Comparisons Using One-Way ANOVA Load the sample data. 9-18

Multiple Comparisons

load carsmall

MPG represents the miles per gallon for each car, and Cylinders represents the number of cylinders in each car, either 4, 6, or 8 cylinders. Test if the mean miles per gallon (mpg) is different across cars that have different numbers of cylinders. Also compute the statistics needed for multiple comparison tests. [p,~,stats] = anova1(MPG,Cylinders,'off'); p p = 4.4902e-24

The small p-value of about 0 is a strong indication that mean miles per gallon is significantly different across cars with different numbers of cylinders. Perform a multiple comparison test, using the Bonferroni method, to determine which numbers of cylinders make a difference in the performance of the cars. [results,means] = multcompare(stats,'CType','bonferroni')

results = 3×6 1.0000 1.0000 2.0000

2.0000 3.0000 3.0000

4.8605 12.6127 3.8940

7.9418 15.2337 7.2919

11.0230 17.8548 10.6899

0.0000 0.0000 0.0000

9-19

9

Analysis of Variance

means = 3×2 29.5300 21.5882 14.2963

0.6363 1.0913 0.8660

In the results matrix, 1, 2, and 3 correspond to cars with 4, 6, and 8 cylinders, respectively. The first two columns show which groups are compared. For example, the first row compares the cars with 4 and 6 cylinders. The fourth column shows the mean mpg difference for the compared groups. The third and fifth columns show the lower and upper limits for a 95% confidence interval for the difference in the group means. The last column shows the p-values for the tests. All p-values are zero, which indicates that the mean mpg for all groups differ across all groups. In the figure the blue bar represents the group of cars with 4 cylinders. The red bars represent the other groups. None of the red comparison intervals for the mean mpg of cars overlap, which means that the mean mpg is significantly different for cars having 4, 6, or 8 cylinders. The first column of the means matrix has the mean mpg estimates for each group of cars. The second column contains the standard errors of the estimates.

Multiple Comparisons for Three-Way ANOVA Load the sample data. y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]'; g1 = [1 2 1 2 1 2 1 2]; g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'}; g3 = {'may';'may';'may';'may';'june';'june';'june';'june'};

y is the response vector and g1, g2, and g3 are the grouping variables (factors). Each factor has two levels, and every observation in y is identified by a combination of factor levels. For example, observation y(1) is associated with level 1 of factor g1, level 'hi' of factor g2, and level 'may' of factor g3. Similarly, observation y(6) is associated with level 2 of factor g1, level 'hi' of factor g2, and level 'june' of factor g3. Test if the response is the same for all factor levels. Also compute the statistics required for multiple comparison tests. [~,~,stats] = anovan(y,{g1 g2 g3},'model','interaction',... 'varnames',{'g1','g2','g3'});

9-20

Multiple Comparisons

The p-value of 0.2578 indicates that the mean responses for levels 'may' and 'june' of factor g3 are not significantly different. The p-value of 0.0347 indicates that the mean responses for levels 1 and 2 of factor g1 are significantly different. Similarly, the p-value of 0.0048 indicates that the mean responses for levels 'hi' and 'lo' of factor g2 are significantly different. Perform multiple comparison tests to find out which groups of the factors g1 and g2 are significantly different. results = multcompare(stats,'Dimension',[1 2])

9-21

9

Analysis of Variance

results = 6×6 1.0000 1.0000 1.0000 2.0000 2.0000 3.0000

2.0000 3.0000 4.0000 3.0000 4.0000 4.0000

-6.8604 4.4896 6.1396 8.8896 10.5396 -0.8104

-4.4000 6.9500 8.6000 11.3500 13.0000 1.6500

-1.9396 9.4104 11.0604 13.8104 15.4604 4.1104

0.0280 0.0177 0.0143 0.0108 0.0095 0.0745

multcompare compares the combinations of groups (levels) of the two grouping variables, g1 and g2. In the results matrix, the number 1 corresponds to the combination of level 1 of g1 and level hi of g2, the number 2 corresponds to the combination of level 2 of g1 and level hi of g2. Similarly, the number 3 corresponds to the combination of level 1 of g1 and level lo of g2, and the number 4 corresponds to the combination of level 2 of g1 and level lo of g2. The last column of the matrix contains the p-values. For example, the first row of the matrix shows that the combination of level 1 of g1 and level hi of g2 has the same mean response values as the combination of level 2 of g1 and level hi of g2. The pvalue corresponding to this test is 0.0280, which indicates that the mean responses are significantly different. You can also see this result in the figure. The blue bar shows the comparison interval for the mean response for the combination of level 1 of g1 and level hi of g2. The red bars are the comparison intervals for the mean response for other group combinations. None of the red bars overlap with the blue bar, which means the mean response for the combination of level 1 of g1 and level hi of g2 is significantly different from the mean response for other group combinations. You can test the other groups by clicking on the corresponding comparison interval for the group. The bar you click on turns to blue. The bars for the groups that are significantly different are red. The bars for the groups that are not significantly different are gray. For example, if you click on the comparison interval for the combination of level 1 of g1 and level lo of g2, the comparison interval for the combination of level 2 of g1 and level lo of g2 overlaps, and is therefore gray. Conversely, the other comparison intervals are red, indicating significant difference.

Multiple Comparison Procedures To specify the multiple comparison procedure you want multcompare to conduct use the 'CType' name-value pair argument. multcompare provides the following procedures: • “Tukey’s Honestly Significant Difference Procedure” on page 9-22 • “Bonferroni Method” on page 9-23 • “Dunn & Sidák’s Approach” on page 9-23 • “Least Significant Difference” on page 9-23 • “Scheffe’s Procedure” on page 9-24 Tukey’s Honestly Significant Difference Procedure You can specify Tukey’s honestly significant difference procedure using the 'CType','TukeyKramer' or 'CType','hsd' name-value pair argument. The test is based on studentized range distribution. Reject H0:αi = αj if t =

yi − y j MSE

9-22

1 ni

+

1 nj

>

1 qα, k, N − k, 2

Multiple Comparisons

where qα, k, N − k is the upper 100*(1 – α)th percentile of the studentized range distribution with parameter k and N – k degrees of freedom. k is the number of groups (treatments or marginal means) and N is the total number of observations. Tukey’s honestly significant difference procedure is optimal for balanced one-way ANOVA and similar procedures with equal sample sizes. It has been proven to be conservative for one-way ANOVA with different sample sizes. According to the unproven Tukey-Kramer conjecture, it is also accurate for problems where the quantities being compared are correlated, as in analysis of covariance with unbalanced covariate values. Bonferroni Method You can specify the Bonferroni method using the 'CType','bonferroni' name-value pair. This method uses critical values from Student’s t-distribution after an adjustment to compensate for k multiple comparisons. The test rejects H0:αi = αj at the α/2 significance level, where k is the 2 number of groups if t =

yi − y j MSE

1 ni

+

>t

1 nj

α 2

k , N − k, 2

where N is the total number of observations and k is the number of groups (marginal means). This procedure is conservative, but usually less so than the Scheffé procedure. Dunn & Sidák’s Approach You can specify Dunn & Sidak’s approach using the 'CType','dunn-sidak' name-value pair argument. It uses critical values from the t-distribution, after an adjustment for multiple comparisons that was proposed by Dunn and proved accurate by Sidák. This test rejects H0:αi = αj if t =

yi − y j MSE

1 ni

+

> t1 − η/2, v,

1 nj

where 1 k 2

η=1− 1−α

and k is the number of groups. This procedure is similar to, but less conservative than, the Bonferroni procedure. Least Significant Difference You can specify the least significance difference procedure using the 'CType','lsd' name-value pair argument. This test uses the test statistic t=

yi − y j MSE

1 ni

+

1 nj

.

It rejects H0:αi = αj if 9-23

9

Analysis of Variance

1 1 y i − y j > tα 2, N − k MSE . + ni n j ⚬ LSD

Fisher suggests a protection against multiple comparisons by performing LSD only when the null hypothesis H0: α1 = α2 = ... = αk is rejected by ANOVA F-test. Even in this case, LSD might not reject any of the individual hypotheses. It is also possible that ANOVA does not reject H0, even when there are differences between some group means. This behavior occurs because the equality of the remaining group means can cause the F-test statistic to be nonsignificant. Without any condition, LSD does not provide any protection against the multiple comparison problem. Scheffe’s Procedure You can specify Scheffe’s procedure using the 'CType','scheffe' name-value pair argument. The critical values are derived from the F distribution. The test rejects H0:αi = αj if yi − y j MSE

1 ni

+

1 nj

>

k − 1 Fk − 1, N − k, α

This procedure provides a simultaneous confidence level for comparisons of all linear combinations of the means. It is conservative for comparisons of simple differences of pairs.

References [1] Milliken G. A. and D. E. Johnson. Analysis of Messy Data. Volume I: Designed Experiments. Boca Raton, FL: Chapman & Hall/CRC Press, 1992. [2] Neter J., M. H. Kutner, C. J. Nachtsheim, W. Wasserman. 4th ed. Applied Linear Statistical Models.Irwin Press, 1996. [3] Hochberg, Y., and A. C. Tamhane. Multiple Comparison Procedures. Hoboken, NJ: John Wiley & Sons, 1987.

See Also anova1 | anova2 | anovan | aoctool | friedman | kruskalwallis | multcompare

Related Examples

9-24



“Perform One-Way ANOVA” on page 9-5



“Perform Two-Way ANOVA” on page 9-13

N-Way ANOVA

N-Way ANOVA In this section... “Introduction to N-Way ANOVA” on page 9-25 “Prepare Data for N-Way ANOVA” on page 9-27 “Perform N-Way ANOVA” on page 9-27

Introduction to N-Way ANOVA You can use the Statistics and Machine Learning Toolbox function anovan to perform N-way ANOVA. Use N-way ANOVA to determine if the means in a set of data differ with respect to groups (levels) of multiple factors. By default, anovan treats all grouping variables as fixed effects. For an example of ANOVA with random effects, see “ANOVA with Random Effects” on page 9-33. For repeated measures, see fitrm and ranova. N-way ANOVA is a generalization of two-way ANOVA. For three factors, for example, the model can be written as yi jkr = μ + αi + β j + γk + (αβ)i j + (αγ)ik + (βγ) jk + (αβγ)i jk + εi jkr, where • yijkr is an observation of the response variable. i represents group i of factor A, i = 1, 2, ..., I, j represents group j of factor B, j = 1, 2, ..., J, k represents group k of factor C, and r represents the replication number, r = 1, 2, ..., R. For constant R, there are a total of N = I*J*K*R observations, but the number of observations does not have to be the same for each combination of groups of factors. • μ is the overall mean. • αi are the deviations of groups of factor A from the overall mean μ due to factor A. The values of αi sum to 0.

∑iI = 1 αi = 0. • βj are the deviations of groups in factor B from the overall mean μ due to factor B. The values of βj sum to 0. J

∑ j = 1 β j = 0. • γk are the deviations of groups in factor C from the overall mean μ due to factor C. The values of γk sum to 0.

∑kK = 1 γk = 0. • (αβ)ij is the interaction term between factors A and B. (αβ)ij sum to 0 over either index.

∑iI = 1

αβ

ij

=

J

∑j = 1

αβ

ij

= 0.

• (αγ)ik is the interaction term between factors A and C. The values of (αγ)ik sum to 0 over either index.

∑iI = 1

αγ

ik

=

∑kK = 1

αγ

ik

= 0.

9-25

9

Analysis of Variance

• (βγ)jk is the interaction term between factors B and C. The values of (βγ)jk sum to 0 over either index. J

∑j = 1

βγ

jk

=

∑kK = 1

βγ

jk

= 0.

• (αβγ)ijk is the three-way interaction term between factors A, B, and C. The values of (αβγ)ijk sum to 0 over any index.

∑iI = 1

αβγ

i jk

=

J

∑j = 1

αβγ

i jk

=

∑kK = 1

αβγ

i jk

= 0.

• εijkr are the random disturbances. They are assumed to be independent, normally distributed, and have constant variance. Three-way ANOVA tests hypotheses about the effects of factors A, B, C, and their interactions on the response variable y. The hypotheses about the equality of the mean responses for groups of factor A are H0 : α1 = α2⋯ = αI H1 :  at least one αi is different,  i = 1,  2,  ...,  I . The hypotheses about the equality of the mean response for groups of factor B are H0 : β1 = β2 = ⋯ = β J H1 :  at least one β j is different,   j = 1,  2,  ...,   J . The hypotheses about the equality of the mean response for groups of factor C are H0 : γ1 = γ2 = ⋯ = γK H1 :  at least one γk is different,  k = 1,  2,  ...,  K . The hypotheses about the interaction of the factors are H0 : αβ

ij

=0

H1 : at least one  αβ H0 : αγ

ik

jk

ik

≠0

jk

≠0

=0

H1 : at least one  βγ H0 : αβγ

≠0

=0

H1 : at least one  αγ H0 : βγ

ij

i jk

=0

H1 : at least one  αβγ

i jk

≠0

In this notation parameters with two subscripts, such as (αβ)ij, represent the interaction effect of two factors. The parameter (αβγ)ijk represents the three-way interaction. An ANOVA model can have the full set of parameters or any subset, but conventionally it does not include complex interaction terms unless it also includes all simpler terms for those factors. For example, one would generally not include the three-way interaction without also including all two-way interactions.

9-26

N-Way ANOVA

Prepare Data for N-Way ANOVA Unlike anova1 and anova2, anovan does not expect data in a tabular form. Instead, it expects a vector of response measurements and a separate vector (or text array) containing the values corresponding to each factor. This input data format is more convenient than matrices when there are more than two factors or when the number of measurements per factor combination is not constant. y2,

y3,

y4,

y5, ⋯, yN ]′

′A′, ′A′,

′C′,

′B′,

′B′, ⋯, ′D′

1

3

y = [ y1, g1 = g2 = [ g3 =

1

2

1 ⋯,

2

]

′hi′, ′mid′, ′low′, ′mid′, ′hi′, ⋯, ′low′

Perform N-Way ANOVA This example shows how to perform N-way ANOVA on car data with mileage and other information on 406 cars made between 1970 and 1982. Load the sample data. load carbig

The example focusses on four variables. MPG is the number of miles per gallon for each of 406 cars (though some have missing values coded as NaN). The other three variables are factors: cyl4 (fourcylinder car or not), org (car originated in Europe, Japan, or the USA), and when (car was built early in the period, in the middle of the period, or late in the period). Fit the full model, requesting up to three-way interactions and Type 3 sums-of-squares. varnames = {'Origin';'4Cyl';'MfgDate'}; anovan(MPG,{org cyl4 when},3,3,varnames)

ans = 7×1 0.0000 NaN

9-27

9

Analysis of Variance

0.0000 0.7032 0.0001 0.2072 0.6990

Note that many terms are marked by a # symbol as not having full rank, and one of them has zero degrees of freedom and is missing a p-value. This can happen when there are missing factor combinations and the model has higher-order terms. In this case, the cross-tabulation below shows that there are no cars made in Europe during the early part of the period with other than four cylinders, as indicated by the 0 in tbl(2,1,1). [tbl,chi2,p,factorvals] = crosstab(org,when,cyl4) tbl = tbl(:,:,1) = 82 0 3

75 4 3

25 3 4

tbl(:,:,2) = 12 23 12

22 26 25

38 17 32

chi2 = 207.7689 p = 8.0973e-38 factorvals=3×3 cell array {'USA' } {'Early'} {'Europe'} {'Mid' } {'Japan' } {'Late' }

{'Other' } {'Four' } {0x0 double}

Consequently it is impossible to estimate the three-way interaction effects, and including the threeway interaction term in the model makes the fit singular. Using even the limited information available in the ANOVA table, you can see that the three-way interaction has a p-value of 0.699, so it is not significant. Examine only two-way interactions. [p,tbl2,stats,terms] = anovan(MPG,{org cyl4 when},2,3,varnames);

9-28

N-Way ANOVA

terms terms = 6×3 1 0 0 1 1 0

0 1 0 1 0 1

0 0 1 0 1 1

Now all terms are estimable. The p-values for interaction term 4 (Origin*4Cyl) and interaction term 6 (4Cyl*MfgDate) are much larger than a typical cutoff value of 0.05, indicating these terms are not significant. You could choose to omit these terms and pool their effects into the error term. The output terms variable returns a matrix of codes, each of which is a bit pattern representing a term. Omit terms from the model by deleting their entries from terms. terms([4 6],:) = [] terms = 4×3 1 0 0 1

0 1 0 0

0 0 1 1

Run anovan again, this time supplying the resulting vector as the model argument. Also return the statistics required for multiple comparisons of factors. [~,~,stats] = anovan(MPG,{org cyl4 when},terms,3,varnames)

9-29

9

Analysis of Variance

stats = struct with fields: source: 'anovan' resid: [1x406 double] coeffs: [18x1 double] Rtr: [10x10 double] rowbasis: [10x18 double] dfe: 388 mse: 14.1056 nullproject: [18x10 double] terms: [4x3 double] nlevels: [3x1 double] continuous: [0 0 0] vmeans: [3x1 double] termcols: [5x1 double] coeffnames: {18x1 cell} vars: [18x3 double] varnames: {3x1 cell} grpnames: {3x1 cell} vnested: [] ems: [] denom: [] dfdenom: [] msdenom: [] varest: [] varci: [] txtdenom: [] txtems: [] rtnames: []

Now you have a more parsimonious model indicating that the mileage of these cars seems to be related to all three factors, and that the effect of the manufacturing date depends on where the car was made. Perform multiple comparisons for Origin and Cylinder. results = multcompare(stats,'Dimension',[1,2])

9-30

N-Way ANOVA

results = 15×6 1.0000 1.0000 1.0000 1.0000 1.0000 2.0000 2.0000 2.0000 2.0000 3.0000 ⋮

2.0000 3.0000 4.0000 5.0000 6.0000 3.0000 4.0000 5.0000 6.0000 4.0000

-5.4891 -4.4146 -9.9992 -14.0237 -12.8980 -0.7171 -7.3655 -9.9992 -9.7464 -8.5396

-3.8412 -2.7251 -8.5828 -12.4240 -11.3080 1.1160 -4.7417 -8.5828 -7.4668 -5.8577

-2.1932 -1.0356 -7.1664 -10.8242 -9.7180 2.9492 -2.1179 -7.1664 -5.1872 -3.1757

0.0000 0.0001 0.0000 0.0000 0.0000 0.5085 0.0000 0.0000 0.0000 0.0000

See Also anova1 | anovan | kruskalwallis | multcompare

Related Examples •

“ANOVA with Random Effects” on page 9-33

More About •

“One-Way ANOVA” on page 9-3 9-31

9

Analysis of Variance

9-32



“Two-Way ANOVA” on page 9-11



“Multiple Comparisons” on page 9-18



“Nonparametric Methods” on page 9-47

ANOVA with Random Effects

ANOVA with Random Effects This example shows how to use anovan to fit models where a factor's levels represent a random selection from a larger (infinite) set of possible levels. In an ordinary ANOVA model, each grouping variable represents a fixed factor. The levels of that factor are a fixed set of values. The goal is to determine whether different factor levels lead to different response values. Set Up the Model Load the sample data. load mileage

The anova2 function works only with balanced data, and it infers the values of the grouping variables from the row and column numbers of the input matrix. The anovan function, on the other hand, requires you to explicitly create vectors of grouping variable values. Create these vectors in the following way. Create an array indicating the factory for each value in mileage. This array is 1 for the first column, 2 for the second, and 3 for the third. factory

= repmat(1:3,6,1);

Create an array indicating the car model for each mileage value. This array is 1 for the first three rows of mileage, and 2 for the remaining three rows. carmod = [ones(3,3); 2*ones(3,3)];

Turn these matrices into vectors and display them. mileage = mileage(:); factory = factory(:); carmod = carmod(:); [mileage factory carmod] ans = 18×3 33.3000 33.4000 32.9000 32.6000 32.5000 33.0000 34.5000 34.8000 33.8000 33.4000 ⋮

1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 2.0000 2.0000 2.0000 2.0000

1.0000 1.0000 1.0000 2.0000 2.0000 2.0000 1.0000 1.0000 1.0000 2.0000

Fit a Random Effects Model Suppose you are studying a few factories but you want information about what would happen if you build these same car models in a different factory, either one that you already have or another that 9-33

9

Analysis of Variance

you might construct. To get this information, fit the analysis of variance model, specifying a model that includes an interaction term and that the factory factor is random. [pvals,tbl,stats] = anovan(mileage, {factory carmod}, ... 'model',2, 'random',1,'varnames',{'Factory' 'Car Model'});

In the fixed effects version of this fit, which you get by omitting the inputs 'random',1 in the preceding code, the effect of car model is significant, with a p-value of 0.0039. But in this example, which takes into account the random variation of the effect of the variable 'Car Model' from one factory to another, the effect is still significant, but with a higher p-value of 0.0136. F-Statistics for Models with Random Effects The F-statistic in a model having random effects is defined differently than in a model having all fixed effects. In the fixed effects model, you compute the F-statistic for any term by taking the ratio of the mean square for that term with the mean square for error. In a random effects model, however, some F-statistics use a different mean square in the denominator. In the example described in Set Up the Model, the effect of the variable 'Factory' could vary across car models. In this case, the interaction mean square takes the place of the error mean square in the F-statistic. Find the F-statistic. F = 26.6756 / 0.02 F = 1.3338e+03

The degrees of freedom for the statistic are the degrees of freedom for the numerator (2) and denominator (2) mean squares. Find the p-value. pval = 1 - fcdf(F,2,2) pval = 7.4919e-04

With random effects, the expected value of each mean square depends not only on the variance of the error term, but also on the variances contributed by the random effects. You can see these dependencies by writing the expected values as linear combinations of contributions from the various model terms. Find the coefficients of these linear combinations. 9-34

ANOVA with Random Effects

stats.ems ans = 4×4 6.0000 0.0000 0.0000 0

0.0000 9.0000 0.0000 0

3.0000 3.0000 3.0000 0

1.0000 1.0000 1.0000 1.0000

This returns the ems field of the stats structure. Display text representations of the linear combinations. stats.txtems ans = 4x1 cell {'6*V(Factory)+3*V(Factory*Car Model)+V(Error)' } {'9*Q(Car Model)+3*V(Factory*Car Model)+V(Error)'} {'3*V(Factory*Car Model)+V(Error)' } {'V(Error)' }

The expected value for the mean square due to car model (second term) includes contributions from a quadratic function of the car model effects, plus three times the variance of the interaction term's effect, plus the variance of the error term. Notice that if the car model effects were all zero, the expression would reduce to the expected mean square for the third term (the interaction term). That is why the F-statistic for the car model effect uses the interaction mean square in the denominator. In some cases there is no single term whose expected value matches the one required for the denominator of the F-statistic. In that case, the denominator is a linear combination of mean squares. The stats structure contains fields giving the definitions of the denominators for each F-statistic. The txtdenom field, stats.txtdenom, contains a text representation, and the denom field contains a matrix that defines a linear combination of the variances of terms in the model. For balanced models like this one, the denom matrix, stats.denom, contains zeros and ones, because the denominator is just a single term's mean square. Display the txtdenom field. stats.txtdenom ans = 3x1 cell {'MS(Factory*Car Model)'} {'MS(Factory*Car Model)'} {'MS(Error)' }

Display the denom field. stats.denom ans = 3×3 0.0000 0.0000 -0.0000

1.0000 1.0000 0.0000

0 -0.0000 1.0000

9-35

9

Analysis of Variance

Variance Components For the model described in Set Up the Model, consider the mileage for a particular car of a particular model made at a random factory. The variance of that car is the sum of components, or contributions, one from each of the random terms. Display the names of the random terms. stats.rtnames ans = 3x1 cell {'Factory' } {'Factory*Car Model'} {'Error' }

You do not know the variances, but you can estimate them from the data. Recall that the ems field of the stats structure expresses the expected value of each term's mean square as a linear combination of unknown variances for random terms, and unknown quadratic forms for fixed terms. If you take the expected mean square expressions for the random terms, and equate those expected values to the computed mean squares, you get a system of equations that you can solve for the unknown variances. These solutions are the variance component estimates. Display the variance component estimate for each term. stats.varest ans = 3×1 4.4426 -0.0313 0.1139

Under some conditions, the variability attributed to a term is unusually low, and that term's variance component estimate is negative. In those cases it is common to set the estimate to zero, which you might do, for example, to create a bar graph of the components. Create a bar graph of the components. bar(max(0,stats.varest))

9-36

ANOVA with Random Effects

gca.xtick = 1:3; gca.xticklabel = stats.rtnames;

You can also compute confidence bounds for the variance estimate. The anovan function does this by computing confidence bounds for the variance expected mean squares, and finding lower and upper limits on each variance component containing all of these bounds. This procedure leads to a set of bounds that is conservative for balanced data. (That is, 95% confidence bounds will have a probability of at least 95% of containing the true variances if the number of observations for each combination of grouping variables is the same.) For unbalanced data, these are approximations that are not guaranteed to be conservative. Display the variance estimates and the confidence limits for the variance estimates of each component. [{'Term' 'Estimate' 'Lower' 'Upper'}; stats.rtnames, num2cell([stats.varest stats.varci])] ans=4×4 cell array {'Term' } {'Factory' } {'Factory*Car Model'} {'Error' }

{'Estimate'} {[ 4.4426]} {[ -0.0313]} {[ 0.1139]}

{'Lower' } {[1.0736]} {[ NaN]} {[0.0586]}

{'Upper' } {[175.6038]} {[ NaN]} {[ 0.3103]}

9-37

9

Analysis of Variance

Other ANOVA Models The anovan function also has arguments that enable you to specify two other types of model terms: • 'nested' argument specifies a matrix that indicates which factors are nested within other factors. A nested factor is one that takes different values within each level its nested factor. Suppose an automobile company has three factories, and each factory makes two car models. The gas mileage in the cars can vary from factory to factory and from model to model. These two factors, factory and model, explain the differences in mileage, that is, the response. One measure of interest is the difference in mileage due to the production methods between factories. Another measure of interest is the difference in the mileage of the models (irrespective of the factory) due to different design specifications. Suppose also that, each factory produces distinct car models for a total of six car models. Then, the car model is nested in factory. Factory

Car Model

1

1

1

2

2

3

2

4

3

5

3

6

It is also common with nested models to number the nested factor the same way in each nested factor. • 'continuous' argument specifies that some factors are to be treated as continuous variables. The remaining factors are categorical variables. Although the anovan function can fit models with multiple continuous and categorical predictors, the simplest model that combines one predictor of each type is known as an analysis of covariance model. “Analysis of Covariance” on page 9-39 describes a specialized tool for fitting this model.

See Also anova1 | anova2 | anovan | kruskalwallis | multcompare

Related Examples •

“ANOVA with Random Effects” on page 9-33

More About

9-38



“One-Way ANOVA” on page 9-3



“Two-Way ANOVA” on page 9-11



“Multiple Comparisons” on page 9-18



“N-Way ANOVA” on page 9-25



“Nonparametric Methods” on page 9-47

Analysis of Covariance

Analysis of Covariance In this section... “Introduction to Analysis of Covariance” on page 9-39 “Analysis of Covariance Tool” on page 9-39 “Confidence Bounds” on page 9-43 “Multiple Comparisons” on page 9-45

Introduction to Analysis of Covariance Analysis of covariance is a technique for analyzing grouped data having a response (y, the variable to be predicted) and a predictor (x, the variable used to do the prediction). Using analysis of covariance, you can model y as a linear function of x, with the coefficients of the line possibly varying from group to group.

Analysis of Covariance Tool The aoctool function opens an interactive graphical environment for fitting and prediction with analysis of covariance (ANOCOVA) models. It fits the following models for the ith group: Same mean

y=α+ε

Separate means

y = (α + αi) + ε

Same line

y = α + βx + ε

Parallel lines

y = (α + αi) + βx + ε

Separate lines

y = (α + αi) + (β + βi)x + ε

For example, in the parallel lines model the intercept varies from one group to the next, but the slope is the same for each group. In the same mean model, there is a common intercept and no slope. In order to make the group coefficients well determined, the tool imposes the constraints

∑αj = ∑ βj = 0 The following steps describe the use of aoctool. 1

Load the data. The Statistics and Machine Learning Toolbox data set carsmall.mat contains information on cars from the years 1970, 1976, and 1982. This example studies the relationship between the weight of a car and its mileage, and whether this relationship has changed over the years. To start the demonstration, load the data set. load carsmall

The Workspace Browser shows the variables in the data set.

9-39

9

Analysis of Variance

You can also use aoctool with your own data. 2

Start the tool. The following command calls aoctool to fit a separate line to the column vectors Weight and MPG for each of the three model group defined in Model_Year. The initial fit models the y variable, MPG, as a linear function of the x variable, Weight. [h,atab,ctab,stats] = aoctool(Weight,MPG,Model_Year);

See the aoctool function reference page for detailed information about calling aoctool. 3

9-40

Examine the output. The graphical output consists of a main window with a plot, a table of coefficient estimates, and an analysis of variance table. In the plot, each Model_Year group has a separate line. The data points for each group are coded with the same color and symbol, and the fit for each group has the same color as the data points.

Analysis of Covariance

The coefficients of the three lines appear in the figure titled ANOCOVA Coefficients. You can see that the slopes are roughly –0.0078, with a small deviation for each group: • Model year 1970: y = (45.9798 – 8.5805) + (–0.0078 + 0.002)x + ε • Model year 1976: y = (45.9798 – 3.8902) + (–0.0078 + 0.0011)x + ε • Model year 1982: y = (45.9798 + 12.4707) + (–0.0078 – 0.0031)x + ε

9-41

9

Analysis of Variance

Because the three fitted lines have slopes that are roughly similar, you may wonder if they really are the same. The Model_Year*Weight interaction expresses the difference in slopes, and the ANOVA table shows a test for the significance of this term. With an F statistic of 5.23 and a p value of 0.0072, the slopes are significantly different.

4

9-42

Constrain the slopes to be the same. To examine the fits when the slopes are constrained to be the same, return to the ANOCOVA Prediction Plot window and use the Model pop-up menu to select a Parallel Lines model. The window updates to show the following graph.

Analysis of Covariance

Though this fit looks reasonable, it is significantly worse than the Separate Lines model. Use the Model pop-up menu again to return to the original model.

Confidence Bounds The example in “Analysis of Covariance Tool” on page 9-39 provides estimates of the relationship between MPG and Weight for each Model_Year, but how accurate are these estimates? To find out, you can superimpose confidence bounds on the fits by examining them one group at a time. 1

In the Model_Year menu at the lower right of the figure, change the setting from All Groups to 82. The data and fits for the other groups are dimmed, and confidence bounds appear around the 82 fit.

9-43

9

Analysis of Variance

The dashed lines form an envelope around the fitted line for model year 82. Under the assumption that the true relationship is linear, these bounds provide a 95% confidence region for the true line. Note that the fits for the other model years are well outside these confidence bounds for Weight values between 2000 and 3000. 2

9-44

Sometimes it is more valuable to be able to predict the response value for a new observation, not just estimate the average response value. Use the aoctool function Bounds menu to change the definition of the confidence bounds from Line to Observation. The resulting wider intervals reflect the uncertainty in the parameter estimates as well as the randomness of a new observation.

Analysis of Covariance

Like the polytool function, the aoctool function has cross hairs that you can use to manipulate the Weight and watch the estimate and confidence bounds along the y-axis update. These values appear only when a single group is selected, not when All Groups is selected.

Multiple Comparisons You can perform a multiple comparison test by using the stats output structure from aoctool as input to the multcompare function. The multcompare function can test either slopes, intercepts, or population marginal means (the predicted MPG of the mean weight for each group). The example in “Analysis of Covariance Tool” on page 9-39 shows that the slopes are not all the same, but could it be that two are the same and only the other one is different? You can test that hypothesis. multcompare(stats,0.05,'on','','s') ans = 1.0000 1.0000 2.0000

2.0000 3.0000 3.0000

-0.0012 0.0013 0.0005

0.0008 0.0051 0.0042

0.0029 0.0088 0.0079

This matrix shows that the estimated difference between the intercepts of groups 1 and 2 (1970 and 1976) is 0.0008, and a confidence interval for the difference is [–0.0012, 0.0029]. There is no 9-45

9

Analysis of Variance

significant difference between the two. There are significant differences, however, between the intercept for 1982 and each of the other two. The graph shows the same information.

Note that the stats structure was created in the initial call to the aoctool function, so it is based on the initial model fit (typically a separate-lines model). If you change the model interactively and want to base your multiple comparisons on the new model, you need to run aoctool again to get another stats structure, this time specifying your new model as the initial model.

9-46

Nonparametric Methods

Nonparametric Methods In this section... “Introduction to Nonparametric Methods” on page 9-47 “Kruskal-Wallis Test” on page 9-47 “Friedman's Test” on page 9-47

Introduction to Nonparametric Methods Statistics and Machine Learning Toolbox functions include nonparametric versions of one-way and two-way analysis of variance. Unlike classical tests, nonparametric tests make only mild assumptions about the data, and are appropriate when the distribution of the data is non-normal. On the other hand, they are less powerful than classical methods for normally distributed data. Both of the nonparametric functions described here will return a stats structure that can be used as an input to the multcompare function for multiple comparisons.

Kruskal-Wallis Test The example “Perform One-Way ANOVA” on page 9-5 uses one-way analysis of variance to determine if the bacteria counts of milk varied from shipment to shipment. The one-way analysis rests on the assumption that the measurements are independent, and that each has a normal distribution with a common variance and with a mean that was constant in each column. You can conclude that the column means were not all the same. The following example repeats that analysis using a nonparametric procedure. The Kruskal-Wallis test is a nonparametric version of one-way analysis of variance. The assumption behind this test is that the measurements come from a continuous distribution, but not necessarily a normal distribution. The test is based on an analysis of variance using the ranks of the data values, not the data values themselves. Output includes a table similar to an ANOVA table, and a box plot. You can run this test as follows: load hogg p = kruskalwallis(hogg) p = 0.0020

The low p value means the Kruskal-Wallis test results agree with the one-way analysis of variance results.

Friedman's Test “Perform Two-Way ANOVA” on page 9-13 uses two-way analysis of variance to study the effect of car model and factory on car mileage. The example tests whether either of these factors has a significant effect on mileage, and whether there is an interaction between these factors. The conclusion of the example is there is no interaction, but that each individual factor has a significant effect. The next example examines whether a nonparametric analysis leads to the same conclusion. 9-47

9

Analysis of Variance

Friedman's test is a nonparametric test for data having a two-way layout (data grouped by two categorical factors). Unlike two-way analysis of variance, Friedman's test does not treat the two factors symmetrically and it does not test for an interaction between them. Instead, it is a test for whether the columns are different after adjusting for possible row differences. The test is based on an analysis of variance using the ranks of the data across categories of the row factor. Output includes a table similar to an ANOVA table. You can run Friedman's test as follows. load mileage p = friedman(mileage,3) p = 7.4659e-004

Recall the classical analysis of variance gave a p value to test column effects, row effects, and interaction effects. This p value is for column effects. Using either this p value or the p value from ANOVA (p < 0.0001), you conclude that there are significant column effects. In order to test for row effects, you need to rearrange the data to swap the roles of the rows in columns. For a data matrix x with no replications, you could simply transpose the data and type p = friedman(x')

With replicated data it is slightly more complicated. A simple way is to transform the matrix into a three-dimensional array with the first dimension representing the replicates, swapping the other two dimensions, and restoring the two-dimensional shape. x x x x

= reshape(mileage, [3 2 3]); = permute(x,[1 3 2]); = reshape(x,[9 2]) = 33.3000 32.6000 33.4000 32.5000 32.9000 33.0000 34.5000 33.4000 34.8000 33.7000 33.8000 33.9000 37.4000 36.6000 36.8000 37.0000 37.6000 36.7000

friedman(x,3) ans = 0.0082

Again, the conclusion is similar to that of the classical analysis of variance. Both this p value and the one from ANOVA (p = 0.0039) lead you to conclude that there are significant row effects. You cannot use Friedman's test to test for interactions between the row and column factors.

9-48

MANOVA

MANOVA In this section... “Introduction to MANOVA” on page 9-49 “ANOVA with Multiple Responses” on page 9-49

Introduction to MANOVA The analysis of variance technique in “Perform One-Way ANOVA” on page 9-5 takes a set of grouped data and determine whether the mean of a variable differs significantly among groups. Often there are multiple response variables, and you are interested in determining whether the entire set of means is different from one group to the next. There is a multivariate version of analysis of variance that can address the problem.

ANOVA with Multiple Responses The carsmall data set has measurements on a variety of car models from the years 1970, 1976, and 1982. Suppose you are interested in whether the characteristics of the cars have changed over time. Load the sample data. load carsmall whos Name Acceleration Cylinders Displacement Horsepower MPG Mfg Model Model_Year Origin Weight

Size

Bytes

100x1 100x1 100x1 100x1 100x1 100x13 100x33 100x1 100x7 100x1

800 800 800 800 800 2600 6600 800 1400 800

Class

Attributes

double double double double double char char double char double

Four of these variables (Acceleration, Displacement, Horsepower, and MPG) are continuous measurements on individual car models. The variable Model_Year indicates the year in which the car was made. You can create a grouped plot matrix of these variables using the gplotmatrix function. Create a grouped plot matrix of these variables using the gplotmatrix function. x = [MPG Horsepower Displacement Weight]; gplotmatrix(x,[],Model_Year,[],'+xo')

9-49

9

Analysis of Variance

(When the second argument of gplotmatrix is empty, the function graphs the columns of the x argument against each other, and places histograms along the diagonals. The empty fourth argument produces a graph with the default colors. The fifth argument controls the symbols used to distinguish between groups.) It appears the cars do differ from year to year. The upper right plot, for example, is a graph of MPG versus Weight. The 1982 cars appear to have higher mileage than the older cars, and they appear to weigh less on average. But as a group, are the three years significantly different from one another? The manova1 function can answer that question. [d,p,stats] = manova1(x,Model_Year) d = 2 p = 2×1 10-6 × 0.0000 0.1141 stats = struct with fields: W: [4x4 double] B: [4x4 double] T: [4x4 double] dfW: 90 dfB: 2

9-50

MANOVA

dfT: lambda: chisq: chisqdf: eigenval: eigenvec: canon: mdist: gmdist: gnames:

92 [2x1 double] [2x1 double] [2x1 double] [4x1 double] [4x4 double] [100x4 double] [1x100 double] [3x3 double] {3x1 cell}

The manova1 function produces three outputs: • The first output, d, is an estimate of the dimension of the group means. If the means were all the same, the dimension would be 0, indicating that the means are at the same point. If the means differed but fell along a line, the dimension would be 1. In the example the dimension is 2, indicating that the group means fall in a plane but not along a line. This is the largest possible dimension for the means of three groups. • The second output, p, is a vector of p-values for a sequence of tests. The first p value tests whether the dimension is 0, the next whether the dimension is 1, and so on. In this case both pvalues are small. That's why the estimated dimension is 2. • The third output, stats, is a structure containing several fields, described in the following section. The Fields of the stats Structure The W, B, and T fields are matrix analogs to the within, between, and total sums of squares in ordinary one-way analysis of variance. The next three fields are the degrees of freedom for these matrices. Fields lambda, chisq, and chisqdf are the ingredients of the test for the dimensionality of the group means. (The p-values for these tests are the first output argument of manova1.) The next three fields are used to do a canonical analysis. Recall that in principal components analysis on page 15-68 (“Principal Component Analysis (PCA)” on page 15-68) you look for the combination of the original variables that has the largest possible variation. In multivariate analysis of variance, you instead look for the linear combination of the original variables that has the largest separation between groups. It is the single variable that would give the most significant result in a univariate one-way analysis of variance. Having found that combination, you next look for the combination with the second highest separation, and so on. The eigenvec field is a matrix that defines the coefficients of the linear combinations of the original variables. The eigenval field is a vector measuring the ratio of the between-group variance to the within-group variance for the corresponding linear combination. The canon field is a matrix of the canonical variable values. Each column is a linear combination of the mean-centered original variables, using coefficients from the eigenvec matrix. Plot the grouped scatter plot of the first two canonical variables. figure() gscatter(c2,c1,Model_Year,[],'oxs')

9-51

9

Analysis of Variance

A grouped scatter plot of the first two canonical variables shows more separation between groups then a grouped scatter plot of any pair of original variables. In this example, it shows three clouds of points, overlapping but with distinct centers. One point in the bottom right sits apart from the others. You can mark this point on the plot using the gname function. Roughly speaking, the first canonical variable, c1, separates the 1982 cars (which have high values of c1) from the older cars. The second canonical variable, c2, reveals some separation between the 1970 and 1976 cars. The final two fields of the stats structure are Mahalanobis distances. The mdist field measures the distance from each point to its group mean. Points with large values may be outliers. In this data set, the largest outlier is the one in the scatter plot, the Buick Estate station wagon. (Note that you could have supplied the model name to the gname function above if you wanted to label the point with its model name rather than its row number.) Find the largest distance from the group mean. max(stats.mdist) ans = 31.5273

Find the point that has the largest distance from the group mean. find(stats.mdist == ans) ans = 20

9-52

MANOVA

Find the car model that corresponds to the largest distance from the group mean. Model(20,:) ans = 'buick estate wagon (sw)

'

The gmdist field measures the distances between each pair of group means. Examine the group means using grpstats. Find the distances between the each pair of group means. stats.gmdist ans = 3×3 0 3.8277 11.1106

3.8277 0 6.1374

11.1106 6.1374 0

As might be expected, the multivariate distance between the extreme years 1970 and 1982 (11.1) is larger than the difference between more closely spaced years (3.8 and 6.1). This is consistent with the scatter plots, where the points seem to follow a progression as the year changes from 1970 through 1976 to 1982. If you had more groups, you might find it instructive to use the manovacluster function to draw a diagram that presents clusters of the groups, formed using the distances between their means.

9-53

9

Analysis of Variance

Model Specification for Repeated Measures Models Model specification for a repeated measures model is a character vector or string scalar representing a formula in the form 'y1-yk

~

terms',

where the responses and terms are in Wilkinson notation. For example, if you have five repeated measures y1, y2, y3, y4, and y5, and you include the terms X1, X2, X3, X4, and X3:X4 in your linear model, then you can specify modelspec as follows: 'y1-y5

~

X1

+

X2

+

X3*X4'.

Wilkinson Notation Wilkinson notation describes the factors present in models. It does not describe the multipliers (coefficients) of those factors. Use these rules to specify the responses in modelspec. Wilkinson Notation

Description

Y1,Y2,Y3

Specific list of variables

Y1-Y5

All table variables from Y1 through Y5

The following rules are for specifying terms in modelspec. Wilkinson Notation

Factors in Standard Notation

1

Constant (intercept) term

X^k, where k is a positive integer

X, X2, ..., Xk

X1 + X2

X1, X2

X1*X2

X1, X2, X1*X2

X1:X2

X1*X2 only

-X2

Do not include X2

X1*X2 + X3

X1, X2, X3, X1*X2

X1 + X2 + X3 + X1:X2

X1, X2, X3, X1*X2

X1*X2*X3 - X1:X2:X3

X1, X2, X3, X1*X2, X1*X3, X2*X3

X1*(X2 + X3)

X1, X2, X3, X1*X2, X1*X3

Statistics and Machine Learning Toolbox notation always includes a constant term unless you explicitly remove the term using -1.

See Also fitrm 9-54

Compound Symmetry Assumption and Epsilon Corrections

Compound Symmetry Assumption and Epsilon Corrections The regular p-value calculations in the repeated measures anova (ranova) are accurate if the theoretical distribution of the response variables has compound symmetry. This means that all response variables have the same variance, and each pair of response variables share a common correlation. That is, 1 ρ ⋯ ρ Σ = σ2

ρ 1 ⋯ ρ

⋮⋮ ⋱⋮ ρ ρ ⋯ 1

.

Under the compound symmetry assumption, the F-statistics in the repeated measures anova table have an F-distribution with degrees of freedom (v1, v2). Here, v1 is the rank of the contrast being tested, and v2 is the degrees of freedom for error. If the compound symmetry assumption is not true, the F-statistic has an approximate F-distribution with degrees of freedom (εv1, εv2), where ε is the correction factor. Then, the p-value must be computed using the adjusted values. The three different correction factor computations are as follows: • Greenhouse-Geisser approximation p



εGG =

i=1 p

d



2

λi

i=1

, 2 λi

where λi i = 1, 2, .., p are the eigenvalues of the covariance matrix. p is the number of variables, and d is equal to p-1. • Huynh-Feldt approximation εHF = min 1,

ndεGG − 2 2

d n − rx − d εGG

,

where n is the number of rows in the design matrix and r is the rank of the design matrix. • Lower bound on the true p-value εLB =

1 . d

References [1] Huynh, H., and L. S. Feldt. “Estimation of the Box Correction for Degrees of Freedom from Sample Data in Randomized Block and Split-Plot Designs.” Journal of Educational Statistics. Vol. 1, 1976, pp. 69–82. [2] Greenhouse, S. W., and S. Geisser. “An Extension of Box’s Result on the Use of F-Distribution in Multivariate Analysis.” Annals of Mathematical Statistics. Vol. 29, 1958, pp. 885–891.

See Also epsilon | mauchly | ranova 9-55

9

Analysis of Variance

More About •

9-56

“Mauchly’s Test of Sphericity” on page 9-57

Mauchly’s Test of Sphericity

Mauchly’s Test of Sphericity The regular p-value calculations in the repeated measures anova (ranova) are accurate if the theoretical distribution of the response variables have compound symmetry. This means that all response variables have the same variance, and each pair of response variables share a common correlation. That is, 1 ρ ⋯ ρ Σ = σ2

ρ 1 ⋯ ρ

⋮⋮ ⋱⋮ ρ ρ ⋯ 1

.

If the compound symmetry assumption is false, then the degrees of freedom for the repeated measures anova test must be adjusted by a factor ε, and the p-value must be computed using the adjusted values. Compound symmetry implies sphericity. For a repeated measures model with responses y1, y2, ..., sphericity means that all pair-wise differences y1 – y2, y1 – y3, ... have the same theoretical variance. Mauchly’s test is the most accepted test for sphericity. Mauchly’s W statistic is W=

T trace T /p

d

,

where T = M′ ∑ M . M is a p-by-d orthogonal contrast matrix, Σ is the covariance matrix, p is the number of variables, and d = p – 1. A chi-square test statistic assesses the significance of W. If n is the number of rows in the design matrix, and r is the rank of the design matrix, then the chi-square statistic is C = − n − r log W D, where D=1−

2

2d + d + 2 . 6d n − r

The C test statistic has a chi-square distribution with (p(p – 1)/2) – 1 degrees of freedom. A small pvalue for the Mauchly’s test indicates that the sphericity assumption does not hold. The rmanova method computes the p-values for the repeated measures anova based on the results of the Mauchly’s test and each epsilon value.

9-57

9

Analysis of Variance

References [1] Mauchly, J. W. “Significance Test for Sphericity of a Normal n-Variate Distribution. The Annals of Mathematical Statistics. Vol. 11, 1940, pp. 204–209.

See Also epsilon | mauchly | ranova

More About •

9-58

“Compound Symmetry Assumption and Epsilon Corrections” on page 9-55

Multivariate Analysis of Variance for Repeated Measures

Multivariate Analysis of Variance for Repeated Measures Multivariate analysis of variance analysis is a test of the form A*B*C = D, where B is the p-by-r matrix of coefficients. p is the number of terms, such as the constant, linear predictors, dummy variables for categorical predictors, and products and powers, r is the number of repeated measures, and n is the number of subjects. A is an a-by-p matrix, with rank a ≤ p, defining hypotheses based on the between-subjects model. C is an r-by-c matrix, with rank c ≤ r ≤ n – p, defining hypotheses based on the within-subjects model, and D is an a-by-c matrix, containing the hypothesized value. manova tests if the model terms are significant in their effect on the response by measuring how they contribute to the overall covariance. It includes all terms in the between-subjects model. manova always takes D as zero. The multivariate response for each observation (subject) is the vector of repeated measures. manova uses four different methods to measure these contributions: Wilks’ lambda, Pillai’s trace, Hotelling-Lawley trace, Roy’s maximum root statistic. Define T = AB C − D, Z = A X′X

−1

A′ .

Then, the hypotheses sum of squares and products matrix is Qh = T′Z−1T, and the residuals sum of squares and products matrix is Qe = C′ R′R C, where R = Y − XB . The matrix Qh is analogous to the numerator of a univariate F-test, and Qe is analogous to the error sum of squares. Hence, the four statistics manova uses are: • Wilks’ lambda Λ=

Qe = Qh + Qe

∏ 1 +1 λi ,

where λi are the solutions of the characteristic equation |Qh – λQe| = 0. • Pillai’s trace V = trace Qh Qh + Qe

−1

=

∑ θi,

where θi values are the solutions of the characteristic equation Qh – θ(Qh + Qe) = 0. • Hotelling-Lawley trace −1

U = trace QhQe

=

∑ λi .

• Roy’s maximum root statistic −1

Θ = max eig QhQe

.

9-59

9

Analysis of Variance

References [1] Charles, S. D. Statistical Methods for the Analysis of Repeated Measurements. Springer Texts in Statistics. Springer-Verlag, New York, Inc., 2002.

See Also coeftest | manova

9-60

10 Bayesian Optimization • “Bayesian Optimization Algorithm” on page 10-2 • “Parallel Bayesian Optimization” on page 10-7 • “Bayesian Optimization Plot Functions” on page 10-11 • “Bayesian Optimization Output Functions” on page 10-19 • “Bayesian Optimization Workflow” on page 10-25 • “Variables for a Bayesian Optimization” on page 10-33 • “Bayesian Optimization Objective Functions” on page 10-36 • “Constraints in Bayesian Optimization” on page 10-38 • “Optimize a Cross-Validated SVM Classifier Using bayesopt” on page 10-45 • “Optimize an SVM Classifier Fit Using Bayesian Optimization” on page 10-56 • “Optimize a Boosted Regression Ensemble” on page 10-64

10

Bayesian Optimization

Bayesian Optimization Algorithm In this section... “Algorithm Outline” on page 10-2 “Gaussian Process Regression for Fitting the Model” on page 10-3 “Acquisition Function Types” on page 10-3 “Acquisition Function Maximization” on page 10-5

Algorithm Outline The Bayesian optimization algorithm attempts to minimize a scalar objective function f(x) for x in a bounded domain. The function can be deterministic or stochastic, meaning it can return different results when evaluated at the same point x. The components of x can be continuous reals, integers, or categorical, meaning a discrete set of names. Note Throughout this discussion, D represents the number of components of x. The key elements in the minimization are: • A Gaussian process model of f(x). • A Bayesian update procedure for modifying the Gaussian process model at each new evaluation of f(x). • An acquisition function a(x) (based on the Gaussian process model of f) that you maximize to determine the next point x for evaluation. For details, see “Acquisition Function Types” on page 10-3 and “Acquisition Function Maximization” on page 10-5. Algorithm outline: • Evaluate yi = f(xi) for NumSeedPoints points xi, taken at random within the variable bounds. NumSeedPoints is a bayesopt setting. If there are evaluation errors, take more random points until there are NumSeedPoints successful evaluations. The probability distribution of each component is either uniform or log-scaled, depending on the Transform value in optimizableVariable. Then repeat the following steps: 1

Update the Gaussian process model of f(x) to obtain a posterior distribution over functions Q(f|xi, yi for i = 1,...,t). (Internally, bayesopt uses fitrgp to fit a Gaussian process model to the data.)

2

Find the new point x that maximizes the acquisition function a(x).

The algorithm stops after reaching any of the following: • A fixed number of iterations (default 30). • A fixed time (default is no time limit). • A stopping criterion that you supply in “Bayesian Optimization Output Functions” on page 10-19 or “Bayesian Optimization Plot Functions” on page 10-11. For the algorithmic differences in parallel, see “Parallel Bayesian Algorithm” on page 10-7. 10-2

Bayesian Optimization Algorithm

Gaussian Process Regression for Fitting the Model The underlying probabilistic model for the objective function f is a Gaussian process prior with added Gaussian noise in the observations. So the prior distribution on f(x) is a Gaussian process with mean μ(x;θ) and covariance kernel function k(x,x′;θ). Here, θ is a vector of kernel parameters. For the particular kernel function bayesopt uses, see “Kernel Function” on page 10-3. In a bit more detail, denote a set of points X = xi with associated objective function values F = fi. The prior’s joint distribution of the function values F is multivariate normal, with mean μ(X) and covariance matrix K(X,X), where Kij = k(xi,xj). Without loss of generality, the prior mean is given as 0. Also, the observations are assumed to have added Gaussian noise with variance σ2. So the prior distribution has covariance K(X,X;θ) + σ2I. Fitting a Gaussian process regression model to observations consists of finding values for the noise variance σ2 and kernel parameters θ. This fitting is a computationally intensive process performed by fitrgp. For details on fitting a Gaussian process to observations, see “Gaussian Process Regression”. Kernel Function The kernel function k(x,x′;θ) can significantly affect the quality of a Gaussian process regression. bayesopt uses the ARD Matérn 5/2 kernel defined in “Kernel (Covariance) Function Options” on page 6-5. See Snoek, Larochelle, and Adams [3].

Acquisition Function Types Six choices of acquisition functions are available for bayesopt. There are three basic types, with expected-improvement also modified by per-second or plus: • 'expected-improvement-per-second-plus' (default) • 'expected-improvement' • 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' The acquisition functions evaluate the “goodness” of a point x based on the posterior distribution function Q. When there are coupled constraints, including the Error constraint (see “Objective Function Errors” on page 10-37), all acquisition functions modify their estimate of “goodness” following a suggestion of Gelbart, Snoek, and Adams [2]. Multiply the “goodness” by an estimate of the probability that the constraints are satisfied, to arrive at the acquisition function. • “Expected Improvement” on page 10-4 • “Probability of Improvement” on page 10-4 • “Lower Confidence Bound” on page 10-4 10-3

10

Bayesian Optimization

• “Per Second” on page 10-4 • “Plus” on page 10-5 Expected Improvement The 'expected-improvement' family of acquisition functions evaluates the expected amount of improvement in the objective function, ignoring values that cause an increase in the objective. In other words, define • xbest as the location of the lowest posterior mean. • μQ(xbest) as the lowest value of the posterior mean. Then the expected improvement EI x, Q = EQ max 0, μQ xbest − f x

.

Probability of Improvement The 'probability-of-improvement' acquisition function makes a similar, but simpler, calculation as 'expected-improvement'. In both cases, bayesopt first calculates xbest and μQ(xbest). Then for 'probability-of-improvement', bayesopt calculates the probability PI that a new point x leads to a better objective function value, modified by a “margin” parameter m: PI x, Q = PQ f x < μQ xbest − m . bayesopt takes m as the estimated noise standard deviation. bayesopt evaluates this probability as PI = Φ νQ x , where νQ x =

μQ xbest − m − μQ x . σQ x

Here Φ(·) is the unit normal CDF, and σQ is the posterior standard deviation of the Gaussian process at x. Lower Confidence Bound The 'lower-confidence-bound' acquisition function looks at the curve G two standard deviations below the posterior mean at each point: G x = μQ x − 2σQ x . G(x) is the 2σQ lower confidence envelope of the objective function model. bayesopt then maximizes the negative of G: LCB = 2σQ x − μQ x . Per Second Sometimes, the time to evaluate the objective function can depend on the region. For example, many Support Vector Machine calculations vary in timing a good deal over certain ranges of points. If so, bayesopt can obtain better improvement per second by using time-weighting in its acquisition function. The cost-weighted acquisition functions have the phrase per-second in their names. 10-4

Bayesian Optimization Algorithm

These acquisition functions work as follows. During the objective function evaluations, bayesopt maintains another Bayesian model of objective function evaluation time as a function of position x. The expected improvement per second that the acquisition function uses is EIpS x =

EIQ x , μS x

where μS(x) is the posterior mean of the timing Gaussian process model. Plus To escape a local objective function minimum, the acquisition functions with plus in their names modify their behavior when they estimate that they are overexploiting an area. To understand overexploiting, let σF(x) be the standard deviation of the posterior objective function at x. Let σ be the posterior standard deviation of the additive noise, so that σQ2(x)

=

σF2(x)

σ2.

+

Define tσ to be the value of the ExplorationRatio option, a positive number. The bayesopt plus acquisition functions, after each iteration, evaluate whether the next point x satisfies σF(x)


1.5) ans = 98 107

Refit the model by removing these observations. lme = fitlme(flu2,'FluRate ~ 1 + WtdILI + (1|Date)','Exclude',[98,107]);

Improve the model. Determine if including an independent random term for the nationwide estimate grouped by Date improves the model. altlme = fitlme(flu2,'FluRate ~ 1 + WtdILI + (1|Date) + (WtdILI-1|Date)',... 'Exclude',[98,107]) altlme =

11-146

Linear Mixed-Effects Model Workflow

Linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

466 2 104 3

Formula: FluRate ~ 1 + WtdILI + (1 | Date) + (WtdILI | Date) Model fit statistics: AIC BIC 179.39 200.11

LogLikelihood -84.694

Fixed effects coefficients Name {'(Intercept)'} {'WtdILI' }

(95% CIs): Estimate 0.17837 0.70836

Deviance 169.39 SE 0.054585 0.030594

tStat 3.2676 23.153

Random effects covariance parameters (95% CIs): Group: Date (52 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'}

Type {'std'}

Lower 0.0711 0.64824

DF 464 464

pValue 0.001165 2.123e-79

Lower 0.12977

Upper 0.28563 0.76849

Estimate 0.16631

Upper 0.21313

Group: Date (52 Levels) Name1 Name2 {'WtdILI'} {'WtdILI'}

Type {'std'}

Estimate 4.6824e-08

Lower NaN

Upper NaN Group: Error Name {'Res Std'}

Estimate 0.26691

Lower 0.24934

Upper 0.28572

The estimated standard deviation of WtdILI term is nearly 0 and its confidence interval cannot be computed. This is an indication that the model is overparameterized and the (WtdILI-1|Date) term is not significant. You can formally test this using the compare method as follows: compare(lme,altlme,'CheckNesting',true). Add a random effects-term for intercept grouped by Region to the initial model lme. lme2 = fitlme(flu2,'FluRate ~ 1 + WtdILI + (1|Date) + (1|Region)',... 'Exclude',[98,107]);

11-147

11

Parametric Regression Analysis

Compare the models lme and lme2. compare(lme,lme2,'CheckNesting',true) ans = THEORETICAL LIKELIHOOD RATIO TEST Model lme lme2

DF 4 5

AIC 177.39 62.265

BIC 193.97 82.986

LogLik -84.694 -26.133

LRStat

deltaDF

pValue

117.12

1

0

The -value of 0 indicates that lme2 is a better fit than lme. Now, check if adding a potentially correlated random-effects term for the intercept and national average improves the model lme2. lme3 = fitlme(flu2,'FluRate ~ 1 + WtdILI + (1|Date) + (1 + WtdILI|Region)',... 'Exclude',[98,107]) lme3 = Linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

466 2 70 5

Formula: FluRate ~ 1 + WtdILI + (1 | Date) + (1 + WtdILI | Region) Model fit statistics: AIC BIC 13.338 42.348

LogLikelihood 0.33076

Fixed effects coefficients (95% CIs): Name Estimate {'(Intercept)'} 0.1795 {'WtdILI' } 0.70719

SE 0.054953 0.04252

tStat 3.2665 16.632

Random effects covariance parameters (95% CIs): Group: Date (52 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'}

Type {'std'}

Lower 0.071514 0.62363

11-148

Deviance -0.66153 DF 464 464

pValue 0.0011697 4.6451e-49

Upper 0.28749 0.79074

Estimate 0.17634

Linear Mixed-Effects Model Workflow

Lower 0.14093

Upper 0.22064

Group: Region (9 Levels) Name1 {'(Intercept)'} {'WtdILI' } {'WtdILI' } Lower 3.0818e-16 -0.99996 0.051693

Name2 {'(Intercept)'} {'(Intercept)'} {'WtdILI' }

Type {'std' } {'corr'} {'std' }

Estimate 0.0077038 -0.059604 0.088069

Upper 1.9257e+11 0.99995 0.15004

Group: Error Name {'Res Std'}

Estimate 0.20976

Lower 0.19568

Upper 0.22486

The estimate for the standard deviation of the random-effects term for intercept grouped by Region is 0.0077037, its confidence interval is very large and includes zero. This indicates that the randomeffects for intercept grouped by Region is insignificant. The correlation between the random-effects for intercept and WtdILI is -0.059604. Its confidence interval is also very large and includes zero. This is an indication that the correlation is not significant. Refit the model by eliminating the intercept from the (1 + WtdILI | Region) random-effects term. lme3 = fitlme(flu2,'FluRate ~ 1 + WtdILI + (1|Date) + (WtdILI - 1|Region)',... 'Exclude',[98,107]) lme3 = Linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

466 2 61 3

Formula: FluRate ~ 1 + WtdILI + (1 | Date) + (WtdILI | Region) Model fit statistics: AIC BIC 9.3395 30.06

LogLikelihood 0.33023

Fixed effects coefficients (95% CIs): Name Estimate {'(Intercept)'} 0.1795 {'WtdILI' } 0.70718 Lower

Deviance -0.66046 SE 0.054892 0.042486

tStat 3.2702 16.645

DF 464 464

pValue 0.0011549 4.0496e-49

Upper

11-149

11

Parametric Regression Analysis

0.071637 0.62369

0.28737 0.79067

Random effects covariance parameters (95% CIs): Group: Date (52 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} Lower 0.14092

Type {'std'}

Estimate 0.17633

Upper 0.22062

Group: Region (9 Levels) Name1 Name2 {'WtdILI'} {'WtdILI'}

Type {'std'}

Estimate 0.087925

Lower 0.054474

Upper 0.14192 Group: Error Name {'Res Std'}

Estimate 0.20979

Lower 0.19585

Upper 0.22473

All terms in the new model lme3 are significant. Compare lme2 and lme3. compare(lme2,lme3,'CheckNesting',true,'NSim',100) ans = SIMULATED LIKELIHOOD RATIO TEST: NSIM = 100, ALPHA = 0.05 Model lme2 lme3

DF 5 5

AIC 62.265 9.3395

Lower

Upper

0.00025064

0.053932

BIC 82.986 30.06

LogLik -26.133 0.33023

LRStat

pValue

52.926

0.009901

The -value of 0.009901 indicates that lme3 is a better fit than lme2. Add a quadratic fixed-effects term to the model lme3. lme4 = fitlme(flu2,'FluRate ~ 1 + WtdILI^2 + (1|Date) + (WtdILI - 1|Region)',... 'Exclude',[98,107]) lme4 = Linear mixed-effects model fit by ML

11-150

Linear Mixed-Effects Model Workflow

Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

466 3 61 3

Formula: FluRate ~ 1 + WtdILI + WtdILI^2 + (1 | Date) + (WtdILI | Region) Model fit statistics: AIC BIC 6.7234 31.588

LogLikelihood 2.6383

Fixed effects coefficients Name {'(Intercept)'} {'WtdILI' } {'WtdILI^2' } Lower -0.30385 0.73406 -0.18358

(95% CIs): Estimate -0.063406 1.0594 -0.096919

Deviance -5.2766 SE 0.12236 0.16554 0.0441

tStat -0.51821 6.3996 -2.1977

DF 463 463 463

pValue 0.60456 3.8232e-10 0.028463

Upper 0.17704 1.3847 -0.010259

Random effects covariance parameters (95% CIs): Group: Date (52 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} Lower 0.13326

Type {'std'}

Estimate 0.16732

Upper 0.21009

Group: Region (9 Levels) Name1 Name2 {'WtdILI'} {'WtdILI'}

Type {'std'}

Estimate 0.087865

Lower 0.054443

Upper 0.1418 Group: Error Name {'Res Std'}

Estimate 0.20979

Lower 0.19585

Upper 0.22473

The -value of 0.028463 indicates that the coefficient of the quadratic term WtdILI^2 is significant. Plot the fitted response versus the observed response and residuals. F = fitted(lme4); R = response(lme4); figure(); plot(R,F,'rx') xlabel('Response') ylabel('Fitted')

11-151

11

Parametric Regression Analysis

The fitted versus observed response values form almost 45-degree angle indicating a good fit. Plot the residuals versus the fitted values. figure(); plotResiduals(lme4,'fitted')

11-152

Linear Mixed-Effects Model Workflow

Although it has improved, you can still see some heteroscedasticity in the model. This might be due to another predictor that does not exist in the data set, hence not in the model. Find the fitted flu rate value for region ENCentral, date 11/6/2005. F(flu2.Region == 'ENCentral' & flu2.Date == '11/6/2005') ans = 1.4860

Randomly generate response values. Randomly generate response values for a national estimate of 1.625, region MidAtl, and date 4/23/2006. First, define the new table. Because Date and Region are nominal in the original table, you must define them similarly in the new table. tblnew.Date = nominal('4/23/2006'); tblnew.WtdILI = 1.625; tblnew.Region = nominal('MidAtl'); tblnew = struct2table(tblnew);

Now, generate the response value. random(lme4,tblnew)

11-153

11

Parametric Regression Analysis

ans = 1.2679

11-154

Fit Mixed-Effects Spline Regression

Fit Mixed-Effects Spline Regression This example shows how to fit a mixed-effects linear spline model. Load the sample data. load('mespline.mat');

This is simulated data. Plot

versus sorted .

[x_sorted,I] = sort(x,'ascend'); plot(x_sorted,y(I),'o')

Fit the following mixed-effects linear spline regression model

where

is the

th knot, and

is the total number of knots. Assume that

and

. Define the knots. 11-155

11

Parametric Regression Analysis

k = linspace(0.05,0.95,100);

Define the design matrices. X = [ones(1000,1),x]; Z = zeros(length(x),length(k)); for j = 1:length(k) Z(:,j) = max(X(:,2) - k(j),0); end

Fit the model with an isotropic covariance structure for the random effects. lme = fitlmematrix(X,y,Z,[],'CovariancePattern','Isotropic');

Fit a fixed-effects only model. X = [X Z]; lme_fixed = fitlmematrix(X,y,[],[]);

Compare lme_fixed and lme via a simulated likelihood ratio test. compare(lme,lme_fixed,'NSim',500,'CheckNesting',true) ans = SIMULATED LIKELIHOOD RATIO TEST: NSIM = 500, ALPHA = 0.05 Model lme lme_fixed

DF 4 103

Lower

Upper

0.63784

0.72129

AIC 170.62 113.38

BIC 190.25 618.88

LogLik -81.309 46.309

LRStat

pValue

255.24

0.68064

The -value indicates that the fixed-effects only model is not a better fit than the mixed-effects spline regression model. Plot the fitted values from both models on top of the original response data. R = response(lme); figure(); plot(x_sorted,R(I),'o', 'MarkerFaceColor',[0.8,0.8,0.8],... 'MarkerEdgeColor',[0.8,0.8,0.8],'MarkerSize',4); hold on F = fitted(lme); F_fixed = fitted(lme_fixed); plot(x_sorted,F(I),'b'); plot(x_sorted,F_fixed(I),'r'); legend('data','mixed effects','fixed effects','Location','NorthWest') xlabel('sorted x values'); ylabel('y'); hold off

11-156

Fit Mixed-Effects Spline Regression

You can also see from the figure that the mixed-effects model provides a better fit to data than the fixed-effects only model.

11-157

11

Parametric Regression Analysis

Train Linear Regression Model Statistics and Machine Learning Toolbox™ provides several features for training a linear regression model. • For greater accuracy on low-dimensional through medium-dimensional data sets, use fitlm. After fitting the model, you can use the object functions to improve, evaluate, and visualize the fitted model. To regularize a regression, use lasso or ridge. • For reduced computation time on high-dimensional data sets, use fitrlinear. This function offers useful options for cross-validation, regularization, and hyperparameter optimization. This example shows the typical workflow for linear regression analysis using fitlm. The workflow includes preparing a data set, fitting a linear regression model, evaluating and improving the fitted model, and predicting response values for new predictor data. The example also describes how to fit and evaluate a linear regression model for tall arrays. Prepare Data Load the sample data set NYCHousing2015. load NYCHousing2015

The data set includes 10 variable with information on the sales of properties in the New York City in 2015. This example uses some of these variables to analyze the sale prices. Instead of loading the sample data set NYCHousing2015, you can download the data from the NYC Open Data website and import the data as follows. folder = 'Annualized_Rolling_Sales_Update'; ds = spreadsheetDatastore(folder,"TextType","string","NumHeaderLines",4); ds.Files = ds.Files(contains(ds.Files,"2015")); ds.SelectedVariableNames = ["BOROUGH","NEIGHBORHOOD","BUILDINGCLASSCATEGORY","RESIDENTIALUNITS", "COMMERCIALUNITS","LANDSQUAREFEET","GROSSSQUAREFEET","YEARBUILT","SALEPRICE","SALEDATE"]; NYCHousing2015 = readall(ds);

Preprocess the data set to choose the predictor variables of interest. First, change the variable names to lowercase for readability. NYCHousing2015.Properties.VariableNames = lower(NYCHousing2015.Properties.VariableNames);

Next, convert the saledate variable, specified as a datetime array, into two numeric columns MM (month) and DD (day), and remove the saledate variable. Ignore the year values because all samples are for the year 2015. [~,NYCHousing2015.MM,NYCHousing2015.DD] = ymd(NYCHousing2015.saledate); NYCHousing2015.saledate = [];

The numeric values in the borough variable indicate the names of the boroughs. Change the variable to a categorical variable using the names. NYCHousing2015.borough = categorical(NYCHousing2015.borough,1:5, ... ["Manhattan","Bronx","Brooklyn","Queens","Staten Island"]);

The neighborhood variable has 254 categories. Remove this variable for simplicity. NYCHousing2015.neighborhood = [];

11-158

Train Linear Regression Model

Convert the buildingclasscategory variable to a categorical variable, and explore the variable by using the wordcloud function. NYCHousing2015.buildingclasscategory = categorical(NYCHousing2015.buildingclasscategory); wordcloud(NYCHousing2015.buildingclasscategory);

Assume that you are interested only in one-, two-, and three-family dwellings. Find the sample indices for these dwellings and delete the other samples. Then, change the data type of the buildingclasscategory variable to double. idx = ismember(string(NYCHousing2015.buildingclasscategory), ... ["01 ONE FAMILY DWELLINGS","02 TWO FAMILY DWELLINGS","03 THREE FAMILY DWELLINGS"]); NYCHousing2015 = NYCHousing2015(idx,:); NYCHousing2015.buildingclasscategory = renamecats(NYCHousing2015.buildingclasscategory, ... ["01 ONE FAMILY DWELLINGS","02 TWO FAMILY DWELLINGS","03 THREE FAMILY DWELLINGS"], ... ["1","2","3"]); NYCHousing2015.buildingclasscategory = double(NYCHousing2015.buildingclasscategory);

The buildingclasscategory variable now indicates the number of families in one dwelling. Explore the response variable saleprice using the summary function. s = summary(NYCHousing2015); s.saleprice ans = struct with fields: Size: [37881 1] Type: 'double'

11-159

11

Parametric Regression Analysis

Description: Units: Continuity: Min: Median: Max: NumMissing:

'' '' [] 0 352000 37000000 0

Assume that a saleprice less than or equal to $1000 indicates ownership transfer without a cash consideration. Remove the samples that have this saleprice. idx0 = NYCHousing2015.saleprice 1) or shrinks (b < 1) the points. • T is the orthogonal rotation and reflection matrix. • c is a matrix with constant values in each column, used to shift the points. The procrustes function chooses b, T, and c to minimize the distance between the target shape X and the transformed shape Z as measured by the least squares criterion: n

p

∑ ∑

i=1 j=1

15-42

2

(Xi j − Zi j)

Procrustes Analysis

Preprocess Data for Accurate Results Procrustes analysis is appropriate when all p measurement dimensions have similar scales. The analysis would be inaccurate, for example, if the columns of Z had different scales: • The first column is measured in milliliters ranging from 2,000 to 6,000. • The second column is measured in degrees Celsius ranging from 10 to 25. • The third column is measured in kilograms ranging from 50 to 230. In such cases, standardize your variables by: 1

Subtracting the sample mean from each variable.

2

Dividing each resultant variable by its sample standard deviation.

Use the zscore function to perform this standardization.

See Also procrustes

Related Examples •

“Compare Handwritten Shapes Using Procrustes Analysis” on page 15-44

15-43

15

Multivariate Methods

Compare Handwritten Shapes Using Procrustes Analysis This example shows how to use Procrustes analysis to compare two handwritten number threes. Visually and analytically explore the effects of forcing size and reflection changes. Load and Display the Original Data Input landmark data for two handwritten number threes. A = [11 39;17 42;25 42;25 40;23 36;19 35;30 34;35 29;... 30 20;18 19]; B = [15 31;20 37;30 40;29 35;25 29;29 31;31 31;35 20;... 29 10;25 18];

Create X and Y from A and B , moving B to the side to make each shape more visible. X = A; Y = B + repmat([25 0], 10,1);

Plot the shapes, using letters to designate the landmark points. Lines in the figure join the points to indicate the drawing path of each shape. plot(X(:,1), X(:,2),'r-', Y(:,1), Y(:,2),'b-'); text(X(:,1), X(:,2),('abcdefghij')') text(Y(:,1), Y(:,2),('abcdefghij')') legend('X = Target','Y = Comparison','location','SE') xlim([0 65]); ylim([0 55]);

15-44

Compare Handwritten Shapes Using Procrustes Analysis

Calculate the Best Transformation Use Procrustes analysis to find the transformation that minimizes distances between landmark data points. [d,Z,tr] = procrustes(X,Y);

The outputs of the function are d (a standardized dissimilarity measure), Z (a matrix of the transformed landmarks), and tr (a structure array of the computed transformation with fields T , b , and c which correspond to the transformation equation). Visualize the transformed shape, Z , using a dashed blue line. plot(X(:,1), X(:,2),'r-', Y(:,1), Y(:,2),'b-',... Z(:,1),Z(:,2),'b:'); text(X(:,1), X(:,2),('abcdefghij')') text(Y(:,1), Y(:,2),('abcdefghij')') text(Z(:,1), Z(:,2),('abcdefghij')') legend('X = Target','Y = Comparison',... 'Z = Transformed','location','SW') xlim([0 65]); ylim([0 55]);

Examine the Similarity of the Two Shapes Use two different numerical values, the dissimilarity measure d and the scaling measure b , to assess the similarity of the target shape and the transformed shape. 15-45

15

Multivariate Methods

The dissimilarity measure d gives a number between 0 and 1 describing the difference between the target shape and the transformed shape. Values near 0 imply more similar shapes, while values near 1 imply dissimilarity. d d = 0.1502

The small value of d in this case shows that the two shapes are similar. procrustes calculates d by comparing the sum of squared deviations between the set of points with the sum of squared deviations of the original points from their column means. numerator = sum(sum((X-Z).^2)) numerator = 166.5321 denominator = sum(sum(bsxfun(@minus,X,mean(X)).^2)) denominator = 1.1085e+03 ratio = numerator/denominator ratio = 0.1502

The resulting measure d is independent of the scale of the size of the shapes and takes into account only the similarity of landmark data. Examine the size similarity of the shapes. tr.b ans = 0.9291

The sizes of the target and comparison shapes in the previous figure appear similar. This visual impression is reinforced by the value of b = % 0.93, which implies that the best transformation results in shrinking the comparison shape by a factor .93 (only 7%). Restrict the Form of the Transformations Explore the effects of manually adjusting the scaling and reflection coefficients. Force b to equal 1 (set 'Scaling' to false) to examine the amount of dissimilarity in size of the target and transformed figures. ds = procrustes(X,Y,'Scaling',false) ds = 0.1552

In this case, setting 'Scaling ' to false increases the calculated value of d only 0.0049, which further supports the similarity in the size of the two number threes. A larger increase in d would have indicated a greater size discrepancy. This example requires only a rotation, not a reflection, to align the shapes. You can show this by observing that the determinant of the matrix T is 1 in this analysis. det(tr.T) ans = 1.0000

15-46

Compare Handwritten Shapes Using Procrustes Analysis

If you need a reflection in the transformation, the determinant of T is -1. You can force a reflection into the transformation as follows. [dr,Zr,trr] = procrustes(X,Y,'Reflection',true); dr dr = 0.8130

The d value increases dramatically, indicating that a forced reflection leads to a poor transformation of the landmark points. A plot of the transformed shape shows a similar result. plot(X(:,1), X(:,2),'r-', Y(:,1), Y(:,2),'b-',... Zr(:,1),Zr(:,2),'b:'); text(X(:,1), X(:,2),('abcdefghij')') text(Y(:,1), Y(:,2),('abcdefghij')') text(Zr(:,1), Zr(:,2),('abcdefghij')') legend('X = Target','Y = Comparison',... 'Z = Transformed','Location','SW') xlim([0 65]); ylim([0 55]);

The landmark data points are now further away from their target counterparts. The transformed three is now an undesirable mirror image of the target three. It appears that the shapes might be better matched if you flipped the transformed shape upside down. Flipping the shapes would make the transformation even worse, however, because the

15-47

15

Multivariate Methods

landmark data points would be further away from their target counterparts. From this example, it is clear that manually adjusting the scaling and reflection parameters is generally not optimal.

See Also procrustes

More About •

15-48

“Procrustes Analysis” on page 15-42

Introduction to Feature Selection

Introduction to Feature Selection This topic provides an introduction to feature selection algorithms and describes the feature selection functions available in Statistics and Machine Learning Toolbox.

Feature Selection Algorithms Feature selection reduces the dimensionality of data by selecting only a subset of measured features (predictor variables) to create a model. Feature selection algorithms search for a subset of predictors that optimally models measured responses, subject to constraints such as required or excluded features and the size of the subset. The main benefits of feature selection are to improve prediction performance, provide faster and more cost-effective predictors, and provide a better understanding of the data generation process [1]. Using too many features can degrade prediction performance even when all features are relevant and contain information about the response variable. You can categorize feature selection algorithms into three types: • “Filter Type Feature Selection” on page 15-50 — The filter type feature selection algorithm measures feature importance based on the characteristics of the features, such as feature variance and feature relevance to the response. You select important features as part of a data preprocessing step and then train a model using the selected features. Therefore, filter type feature selection is uncorrelated to the training algorithm. • “Wrapper Type Feature Selection” on page 15-53 — The wrapper type feature selection algorithm starts training using a subset of features and then adds or removes a feature using a selection criterion. The selection criterion directly measures the change in model performance that results from adding or removing a feature. The algorithm repeats training and improving a model until an algorithm its stopping criteria are satisfied. • “Embedded Type Feature Selection” on page 15-54 — The embedded type feature selection algorithm learns feature importance as part of the model learning process. Once you train a model, you obtain the importance of the features in the trained model. This type of algorithm selects features that work well with a particular learning process. In addition, you can categorize feature selection algorithms according to whether or not an algorithm ranks features sequentially. The minimum redundancy maximum relevance (MRMR) algorithm and stepwise regression are two examples of the sequential feature selection algorithm. For details, see “Sequential Feature Selection” on page 15-61. For regression problems, you can compare the importance of predictor variables visually by creating partial dependence plots (PDP) and individual conditional expectation (ICE) plots. For details, see plotPartialDependence. For classification problems, after selecting features, you can train two models (for example, a full model and a model trained with a subset of predictors) and compare the accuracies of the models by using the compareHoldout, testcholdout, or testckfold functions. Feature selection is preferable to feature transformation when the original features and their units are important and the modeling goal is to identify an influential subset. When categorical features are present, and numerical transformations are inappropriate, feature selection becomes the primary means of dimension reduction.

15-49

15

Multivariate Methods

Feature Selection Functions Statistics and Machine Learning Toolbox offers several functions for feature selection. Choose the appropriate feature selection function based on your problem and the data types of the features. Filter Type Feature Selection Function

Supported Problem

Supported Data Type

Description

fscchi2

Classification

Categorical and continuous features

Examine whether each predictor variable is independent of a response variable by using individual chi-square tests, and then rank features using the p-values of the chi-square test statistics. For examples, see the function reference page fscchi2.

fscmrmr

Classification

Categorical and continuous features

Rank features sequentially using the “Minimum Redundancy Maximum Relevance (MRMR) Algorithm” on page 32-2249. For examples, see the function reference page fscmrmr.

fscnca*

Classification

Continuous features

Determine the feature weights by using a diagonal adaptation of neighborhood component analysis (NCA). This algorithm works best for estimating feature importance for distance-based supervised models that use pairwise distances between observations to predict the response. For details, see the function reference page fscnca and these topics: • “Neighborhood Component Analysis (NCA) Feature Selection” on page 15-99 • “Tune Regularization Parameter to Detect Features Using NCA for Classification”

15-50

Introduction to Feature Selection

Function

Supported Problem

Supported Data Type

Description

fsrftest

Regression

Categorical and continuous features

Examine the importance of each predictor individually using an F-test, and then rank features using the pvalues of the F-test statistics. Each Ftest tests the hypothesis that the response values grouped by predictor variable values are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same. For examples, see the function reference page fsrftest.

fsrnca*

Regression

Continuous features

Determine the feature weights by using a diagonal adaptation of neighborhood component analysis (NCA). This algorithm works best for estimating feature importance for distance-based supervised models that use pairwise distances between observations to predict the response. For details, see the function reference page fsrnca and these topics: • “Neighborhood Component Analysis (NCA) Feature Selection” on page 15-99 • “Robust Feature Selection Using NCA for Regression” on page 1585

fsulaplacian

Unsupervised learning

Continuous features

Rank features using the “Laplacian Score” on page 32-2304. For examples, see the function reference page fsulaplacian.

15-51

15

Multivariate Methods

Function

Supported Problem

Supported Data Type

Description

relieff

Classification and regression

Either all categorical or all continuous features

Rank features using the “ReliefF” on page 32-4714 algorithm for classification and the “RReliefF” on page 32-4715 algorithm for regression. This algorithm works best for estimating feature importance for distance-based supervised models that use pairwise distances between observations to predict the response. For examples, see the function reference page relieff.

sequentialfs

Classification and regression

Either all categorical or all continuous features

Select features sequentially using a custom criterion. Define a function that measures the characteristics of data to select features, and pass the function handle to the sequentialfs function. You can specify sequential forward selection or sequential backward selection by using the 'Direction' name-value pair argument. sequentialfs evaluates the criterion using cross-validation.

*You can also consider fscnca and fsrnca as embedded type feature selection functions because they return a trained model object and you can use the object functions predict and loss. However, you typically use these object functions to tune the regularization parameter of the algorithm. After selecting features using the fscnca or fsrnca function as part of a data preprocessing step, you can apply another classification or regression algorithm for your problem.

15-52

Introduction to Feature Selection

Wrapper Type Feature Selection Function

Supported Problem

Supported Data Type

Description

sequentialfs

Classification and regression

Either all categorical or all continuous features

Select features sequentially using a custom criterion. Define a function that implements a supervised learning algorithm or a function that measures performance of a learning algorithm, and pass the function handle to the sequentialfs function. You can specify sequential forward selection or sequential backward selection by using the 'Direction' name-value pair argument. sequentialfs evaluates the criterion using cross-validation. For examples, see the function reference page sequentialfs and these topics: • “Select Subset of Features with Comparative Predictive Power” on page 15-61 • “Selecting Features for Classifying High-dimensional Data”

15-53

15

Multivariate Methods

Embedded Type Feature Selection Function

Supported Problem

Supported Data Type

Description

DeltaPredictor property of a Classification Discriminant model object

Linear discriminant analysis classification

Continuous features

Create a linear discriminant analysis classifier by using fitcdiscr. A trained classifier, returned as ClassificationDiscriminant, stores the coefficient magnitude in the DeltaPredictor property. You can use the values in DeltaPredictor as measures of the predictor importance. This classifier uses the two regularization parameters “Gamma and Delta” on page 32-996 to identify and remove redundant predictors. You can obtain appropriate values for these parameters by using the cvshrink function or the 'OptimizeHyperparameters' namevalue pair argument. For examples, see these topics: • “Regularize Discriminant Analysis Classifier” on page 20-21 • “Optimize Discriminant Analysis Model” on page 32-1460

fitcecoc with Linear Continuous templateLinear classification for features multiclass learning with highdimensional data

Train a linear classification model by using fitcecoc and linear binary learners defined by templateLinear. Specify 'Regularization' of templatelinear as 'lasso' to use lasso regularization. For an example, see “Find Good Lasso Penalty Using Cross-Validation” on page 32-462. This example determines a good lasso-penalty strength by evaluating models with different strength values using kfoldLoss. You can also evaluate models using kfoldEdge, kfoldMargin, edge, loss, or margin.

15-54

Introduction to Feature Selection

Function

Supported Problem

Supported Data Type

Description

fitclinear

Linear classification for binary learning with highdimensional data

Continuous features

Train a linear classification model by using fitclinear. Specify 'Regularization' of fitclinear as 'lasso' to use lasso regularization.

Regression

Categorical and continuous features

fitrgp

For an example, see “Find Good Lasso Penalty Using Cross-Validated AUC” on page 32-2851. This example determines a good lasso-penalty strength by evaluating models with different strength values using the AUC values. Compute the cross-validated posterior class probabilities by using kfoldPredict, and compute the AUC values by using perfcurve. You can also evaluate models using kfoldEdge, kfoldLoss, kfoldMargin, edge, loss, margin, or predict. Train a Gaussian process regression (GPR) model by using fitrgp. Set the 'KernelFunction' name-value pair argument to use automatic relevance determination (ARD). Available options are 'ardsquaredexponential', 'ardexponential', 'ardmatern32', 'ardmatern52', and 'ardrationalquadratic'. Find the predictor weights by taking the exponential of the negative learned length scales, stored in the KernelInformation property. For examples, see these topics: • “Specify Initial Step Size for LBFGS Optimization” on page 32-1919 • “Compare NCA and ARD Feature Selection” on page 32-2273

15-55

15

Multivariate Methods

Function

Supported Problem

Supported Data Type

Description

fitrlinear

Linear regression with highdimensional data

Continuous features

Train a linear regression model by using fitrlinear. Specify 'Regularization' of fitrlinear as 'lasso' to use lasso regularization. For examples, see these topics: • “Find Good Lasso Penalty Using Regression Loss” on page 32-5065 • “Find Good Lasso Penalty Using Cross-Validation” on page 32-1946

lasso

Linear regression

Continuous features

Train a linear regression model with “Lasso” on page 32-2995 regularization by using lasso. You can specify the weight of lasso versus ridge optimization by using the 'Alpha' name-value pair argument. For examples, see the function reference page lasso and these topics: • “Lasso Regularization” on page 11117 • “Lasso and Elastic Net with Cross Validation” on page 11-120 • “Wide Data via Lasso and Parallel Computing” on page 11-112

lassoglm

Generalized linear Continuous regression features

Train a generalized linear regression model with “Lasso” on page 32-3010 regularization by using lassoglm. You can specify the weight of lasso versus ridge optimization by using the 'Alpha' name-value pair argument. For details, see the function reference page lassoglm and these topics: • “Lasso Regularization of Generalized Linear Models” on page 12-32 • “Regularize Poisson Regression” on page 12-34 • “Regularize Logistic Regression” on page 12-36 • “Regularize Wide Data in Parallel” on page 12-43

15-56

Introduction to Feature Selection

Function

Supported Problem

Supported Data Type

oobPermutedPre dictorImportan ce** of Classification BaggedEnsemble

Classification with Categorical and an ensemble of continuous bagged decision features trees (for example, random forest)

Description Train a bagged classification ensemble with tree learners by using fitcensemble and specifying 'Method' as 'bag'. Then, use oobPermutedPredictorImportance to compute “Out-of-Bag, Predictor Importance Estimates by Permutation” on page 32-3798. The function measures how influential the predictor variables in the model are at predicting the response. For examples, see the function reference page and the topic oobPermutedPredictorImportance .

oobPermutedPre dictorImportan ce** of RegressionBagg edEnsemble

Regression with an Categorical and ensemble of continuous bagged decision features trees (for example, random forest)

Train a bagged regression ensemble with tree learners by using fitrensemble and specifying 'Method' as 'bag'. Then, use oobPermutedPredictorImportance to compute “Out-of-Bag, Predictor Importance Estimates by Permutation” on page 32-3805. The function measures how influential the predictor variables in the model are at predicting the response. For examples, see the function reference page oobPermutedPredictorImportance and “Select Predictors for Random Forests” on page 18-59.

predictorImpor Classification with Categorical and tance** of an ensemble of continuous Classification decision trees features Ensemble

Train a classification ensemble with tree learners by using fitcensemble. Then, use predictorImportance to compute estimates of “Predictor Importance” on page 32-4341 for the ensemble by summing changes in the risk due to splits on every predictor and dividing the sum by the number of branch nodes. For examples, see the function reference page predictorImportance.

15-57

15

Multivariate Methods

Function

Supported Problem

Supported Data Type

predictorImpor Classification with Categorical and tance** of a decision tree continuous Classification features Tree

Description Train a classification tree by using fitctree. Then, use predictorImportance to compute estimates of “Predictor Importance” on page 32-4347 for the tree by summing changes in the risk due to splits on every predictor and dividing the sum by the number of branch nodes. For examples, see the function reference page predictorImportance.

predictorImpor Regression with an Categorical and tance** of ensemble of continuous RegressionEnse decision trees features mble

Train a regression ensemble with tree learners by using fitrensemble. Then, use predictorImportance to compute estimates of “Predictor Importance” on page 32-4352 for the ensemble by summing changes in the risk due to splits on every predictor and dividing the sum by the number of branch nodes. For examples, see the function reference page predictorImportance.

predictorImpor Regression with a tance** of decision tree RegressionTree

Categorical and continuous features

Train a regression tree by using fitrtree. Then, use predictorImportance to compute estimates of “Predictor Importance” on page 32-4357 for the tree by summing changes in the mean squared error (MSE) due to splits on every predictor and dividing the sum by the number of branch nodes. For examples, see the function reference page predictorImportance.

15-58

Introduction to Feature Selection

Function

Supported Problem

Supported Data Type

stepwiseglm

Generalized linear Categorical and regression continuous features

Description Fit a generalized linear regression model using stepwise regression by using stepwiseglm. Alternatively, you can fit a linear regression model by using fitglm and then adjust the model by using step. Stepwise regression is a systematic method for adding and removing terms from the model based on their statistical significance in explaining the response variable. For details, see the function reference page stepwiseglm and these topics: • “Generalized Linear Model Using Stepwise Algorithm” on page 325250 • “Generalized Linear Models” on page 12-9 • “Generalized Linear Model Workflow” on page 12-28

stepwiselm

Linear regression

Categorical and continuous features

Fit a linear regression model using stepwise regression by using stepwiselm. Alternatively, you can fit a linear regression model by using fitlm and then adjust the model by using step. Stepwise regression is a systematic method for adding and removing terms from the model based on their statistical significance in explaining the response variable. For details, see the function reference page stepwiselm and these topics: • “Stepwise Regression” on page 1198 • “Linear Regression with Interaction Effects” on page 11-43 • “Assess Significance of Regression Coefficients Using t-statistic” on page 11-74

**For a tree-based algorithm, specify 'PredictorSelection' as 'interaction-curvature' to use the interaction test for selecting the best split predictor. The interaction test is useful in identifying important variables in the presence of many irrelevant variables. Also, if the training data includes many predictors, then specify 'NumVariablesToSample' as 'all' for training.

15-59

15

Multivariate Methods

Otherwise, the software might not select some predictors, underestimating their importance. For details, see fitctree, fitrtree, and templateTree.

References [1] Guyon, Isabelle, and A. Elisseeff. "An introduction to variable and feature selection." Journal of Machine Learning Research. Vol. 3, 2003, pp. 1157–1182.

See Also rankfeatures

More About •

15-60

“Sequential Feature Selection” on page 15-61

Sequential Feature Selection

Sequential Feature Selection This topic introduces to sequential feature selection and provides an example that selects features sequentially using a custom criterion and the sequentialfs function.

Introduction to Sequential Feature Selection A common method of Feature Selection on page 15-49 is sequential feature selection. This method has two components: • An objective function, called the criterion, which the method seeks to minimize over all feasible feature subsets. Common criteria are mean squared error (for regression models) and misclassification rate (for classification models). • A sequential search algorithm, which adds or removes features from a candidate subset while evaluating the criterion. Since an exhaustive comparison of the criterion value at all 2n subsets of an n-feature data set is typically infeasible (depending on the size of n and the cost of objective calls), sequential searches move in only one direction, always growing or always shrinking the candidate set. The method has two variants: • Sequential forward selection (SFS), in which features are sequentially added to an empty candidate set until the addition of further features does not decrease the criterion. • Sequential backward selection (SBS), in which features are sequentially removed from a full candidate set until the removal of further features increase the criterion. Statistics and Machine Learning Toolbox offers several sequential feature selection functions: • Stepwise regression is a sequential feature selection technique designed specifically for leastsquares fitting. The functions stepwiselm and stepwiseglm use optimizations that are possible only with least-squares criteria. Unlike other sequential feature selection algorithms, stepwise regression can remove features that have been added or add features that have been removed, based on the criterion specified by the 'Criterion' name-value pair argument. • sequentialfs performs sequential feature selection using a custom criterion. Input arguments include predictor data, response data, and a function handle to a file implementing the criterion function. You can define a criterion function that measures the characteristics of data or the performance of a learning algorithm. Optional inputs allow you to specify SFS or SBS, required or excluded features, and the size of the feature subset. The function calls cvpartition and crossval to evaluate the criterion at different candidate sets. • fscmrmr ranks features using the minimum redundancy maximum relevance (MRMR) algorithm for classification problems.

Select Subset of Features with Comparative Predictive Power This example selects a subset of features using a custom criterion that measures predictive power for a generalized linear regression problem. Consider a data set with 100 observations of 10 predictors. Generate the random data from a logistic model, with a binomial distribution of responses at each set of values for the predictors. Some coefficients are set to zero so that not all of the predictors affect the response. 15-61

15

Multivariate Methods

rng(456) % Set the seed for reproducibility n = 100; m = 10; X = rand(n,m); b = [1 0 0 2 .5 0 0 0.1 0 1]; Xb = X*b'; p = 1./(1+exp(-Xb)); N = 50; y = binornd(N,p);

Fit a logistic model to the data using fitglm. Y = [y N*ones(size(y))]; model0 = fitglm(X,Y,'Distribution','binomial') model0 = Generalized linear regression model: logit(y) ~ 1 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 Distribution = Binomial Estimated Coefficients: Estimate _________ (Intercept) x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

0.22474 0.68782 0.2003 -0.055328 2.2576 0.54603 0.069701 -0.22562 -0.19712 -0.20373 0.99741

SE _______

tStat ________

pValue __________

0.30043 0.17207 0.18087 0.18871 0.1813 0.16836 0.17738 0.16957 0.17317 0.16796 0.17247

0.74806 3.9973 1.1074 -0.29319 12.452 3.2432 0.39294 -1.3306 -1.1383 -1.213 5.7832

0.45443 6.408e-05 0.26811 0.76937 1.3566e-35 0.0011821 0.69437 0.18334 0.25498 0.22514 7.3296e-09

100 observations, 89 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 222, p-value = 4.92e-42

Display the deviance of the fit. dev0 = model0.Deviance dev0 = 101.5648

This model is the full model, with all of the features and an initial constant term. Sequential feature selection searches for a subset of the features in the full model with comparative predictive power. Before performing feature selection, you must specify a criterion for selecting the features. In this case, the criterion is the deviance of the fit (a generalization of the residual sum of squares). The critfun function (shown at the end of this example) calls fitglm and returns the deviance of the fit. If you use the live script file for this example, the critfun function is already included at the end of the file. Otherwise, you need to create this function at the end of your .m file or add it as a file on the MATLAB path. 15-62

Sequential Feature Selection

Perform feature selection. sequentialfs calls the criterion function via a function handle. maxdev = chi2inv(.95,1); opt = statset('display','iter',... 'TolFun',maxdev,... 'TolTypeFun','abs'); inmodel = sequentialfs(@critfun,X,Y,... 'cv','none',... 'nullmodel',true,... 'options',opt,... 'direction','forward'); Start forward sequential feature selection: Initial columns included: none Columns that can not be included: none Step 1, used initial columns, criterion value 323.173 Step 2, added column 4, criterion value 184.794 Step 3, added column 10, criterion value 139.176 Step 4, added column 1, criterion value 119.222 Step 5, added column 5, criterion value 107.281 Final columns included: 1 4 5 10

The iterative display shows a decrease in the criterion value as each new feature is added to the model. The final result is a reduced model with only four of the original ten features: columns 1, 4, 5, and 10 of X, as indicated in the logical vector inmodel returned by sequentialfs. The deviance of the reduced model is higher than the deviance of the full model. However, the addition of any other single feature would not decrease the criterion value by more than the absolute tolerance, maxdev, set in the options structure. Adding a feature with no effect reduces the deviance by an amount that has a chi-square distribution with one degree of freedom. Adding a significant feature results in a larger change in the deviance. By setting maxdev to chi2inv(.95,1), you instruct sequentialfs to continue adding features provided that the change in deviance is more than the change expected by random chance. Create the reduced model with an initial constant term. model = fitglm(X(:,inmodel),Y,'Distribution','binomial') model = Generalized linear regression model: logit(y) ~ 1 + x1 + x2 + x3 + x4 Distribution = Binomial Estimated Coefficients: Estimate __________ (Intercept) x1 x2 x3 x4

-0.0052025 0.73814 2.2139 0.54073 1.0694

SE _______

tStat _________

pValue __________

0.16772 0.16316 0.17402 0.1568 0.15916

-0.031018 4.5241 12.722 3.4485 6.7191

0.97525 6.0666e-06 4.4369e-37 0.00056361 1.8288e-11

100 observations, 95 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 216, p-value = 1.44e-45

15-63

15

Multivariate Methods

This code creates the function critfun. function dev = critfun(X,Y) model = fitglm(X,Y,'Distribution','binomial'); dev = model.Deviance; end

See Also sequentialfs | stepwiseglm | stepwiselm

More About •

15-64

“Introduction to Feature Selection” on page 15-49

Nonnegative Matrix Factorization

Nonnegative Matrix Factorization Nonnegative matrix factorization (NMF) is a dimension-reduction technique based on a low-rank approximation of the feature space. Besides providing a reduction in the number of features, NMF guarantees that the features are nonnegative, producing additive models that respect, for example, the nonnegativity of physical quantities. Given a nonnegative m-by-n matrix X and a positive integer k < min(m,n), NMF finds nonnegative mby-k and k-by-n matrices W and H, respectively, that minimize the norm of the difference X – WH. W and H are thus approximate nonnegative factors of X. The k columns of W represent transformations of the variables in X; the k rows of H represent the coefficients of the linear combinations of the original n variables in X that produce the transformed variables in W. Since k is generally smaller than the rank of X, the product WH provides a compressed approximation of the data in X. A range of possible values for k is often suggested by the modeling context. The Statistics and Machine Learning Toolbox function nnmf carries out nonnegative matrix factorization. nnmf uses one of two iterative algorithms that begin with random initial values for W and H. Because the norm of the residual X – WH may have local minima, repeated calls to nnmf may yield different factorizations. Sometimes the algorithm converges to a solution of lower rank than k, which may indicate that the result is not optimal.

See Also nnmf

Related Examples •

“Perform Nonnegative Matrix Factorization” on page 15-66

15-65

15

Multivariate Methods

Perform Nonnegative Matrix Factorization This example shows how to perform nonnegative matrix factorization. Load the sample data. load moore X = moore(:,1:5); rng('default'); % For reproducibility

Compute a rank-two approximation of X using a multiplicative update algorithm that begins from five random initial values for W and H. opt = statset('MaxIter',10,'Display','final'); [W0,H0] = nnmf(X,2,'replicates',5,'options',opt,'algorithm','mult'); rep iteration rms resid |delta x| 1 10 358.296 0.00190554 2 10 78.3556 0.000351747 3 10 230.962 0.0172839 4 10 326.347 0.00739552 5 10 361.547 0.00705539 Final root mean square residual = 78.3556

The 'mult' algorithm is sensitive to initial values, which makes it a good choice when using 'replicates' to find W and H from multiple random starting values. Now perform the factorization using alternating least-squares algorithm, which converges faster and more consistently. Run 100 times more iterations, beginning from the initial W0 and H0 identified above. opt = statset('Maxiter',1000,'Display','final'); [W,H] = nnmf(X,2,'w0',W0,'h0',H0,'options',opt,'algorithm','als'); rep iteration rms resid |delta x| 1 2 77.5315 0.000830334 Final root mean square residual = 77.5315

The two columns of W are the transformed predictors. The two rows of H give the relative contributions of each of the five predictors in X to the predictors in W. Display H. H H = 2×5 0.0835 0.0559

0.0190 0.0250

0.1782 0.9969

0.0072 0.0085

0.9802 0.0497

The fifth predictor in X (weight 0.9802) strongly influences the first predictor in W. The third predictor in X (weight 0.9969) strongly influences the second predictor in W. Visualize the relative contributions of the predictors in X with biplot, showing the data and original variables in the column space of W. biplot(H','scores',W,'varlabels',{'','','v3','','v5'}); axis([0 1.1 0 1.1])

15-66

Perform Nonnegative Matrix Factorization

xlabel('Column 1') ylabel('Column 2')

See Also nnmf

More About •

“Nonnegative Matrix Factorization” on page 15-65

15-67

15

Multivariate Methods

Principal Component Analysis (PCA) One of the difficulties inherent in multivariate statistics is the problem of visualizing data that has many variables. The MATLAB function plot displays a graph of the relationship between two variables. The plot3 and surf commands display different three-dimensional views. But when there are more than three variables, it is more difficult to visualize their relationships. Fortunately, in data sets with many variables, groups of variables often move together. One reason for this is that more than one variable might be measuring the same driving principle governing the behavior of the system. In many systems there are only a few such driving forces. But an abundance of instrumentation enables you to measure dozens of system variables. When this happens, you can take advantage of this redundancy of information. You can simplify the problem by replacing a group of variables with a single new variable. Principal component analysis is a quantitatively rigorous method for achieving this simplification. The method generates a new set of variables, called principal components. Each principal component is a linear combination of the original variables. All the principal components are orthogonal to each other, so there is no redundant information. The principal components as a whole form an orthogonal basis for the space of the data. There are an infinite number of ways to construct an orthogonal basis for several columns of data. What is so special about the principal component basis? The first principal component is a single axis in space. When you project each observation on that axis, the resulting values form a new variable. And the variance of this variable is the maximum among all possible choices of the first axis. The second principal component is another axis in space, perpendicular to the first. Projecting the observations on this axis generates another new variable. The variance of this variable is the maximum among all possible choices of this second axis. The full set of principal components is as large as the original set of variables. But it is commonplace for the sum of the variances of the first few principal components to exceed 80% of the total variance of the original data. By examining plots of these few new variables, researchers often develop a deeper understanding of the driving forces that generated the original data. You can use the function pca to find the principal components. To use pca, you need to have the actual measured data you want to analyze. However, if you lack the actual data, but have the sample covariance or correlation matrix for the data, you can still use the function pcacov to perform a principal components analysis. See the reference page for pcacov for a description of its inputs and outputs.

See Also pca | pcacov | pcares | ppca

Related Examples •

15-68

“Analyze Quality of Life in U.S. Cities Using PCA” on page 15-69

Analyze Quality of Life in U.S. Cities Using PCA

Analyze Quality of Life in U.S. Cities Using PCA This example shows how to perform a weighted principal components analysis and interpret the results. Load sample data. Load the sample data. The data includes ratings for 9 different indicators of the quality of life in 329 U.S. cities. These are climate, housing, health, crime, transportation, education, arts, recreation, and economics. For each category, a higher rating is better. For example, a higher rating for crime means a lower crime rate. Display the categories variable. load cities categories categories = climate housing health crime transportation education arts recreation economics

In total, the cities data set contains three variables: • categories, a character matrix containing the names of the indices • names, a character matrix containing the 329 city names • ratings, the data matrix with 329 rows and 9 columns Plot data. Make a boxplot to look at the distribution of the ratings data. figure() boxplot(ratings,'Orientation','horizontal','Labels',categories)

15-69

15

Multivariate Methods

There is more variability in the ratings of the arts and housing than in the ratings of crime and climate. Check pairwise correlation. Check the pairwise correlation between the variables. C = corr(ratings,ratings);

The correlation among some variables is as high as 0.85. Principal components analysis constructs independent new variables which are linear combinations of the original variables. Compute principal components. When all variables are in the same unit, it is appropriate to compute principal components for raw data. When the variables are in different units or the difference in the variance of different columns is substantial (as in this case), scaling of the data or use of weights is often preferable. Perform the principal component analysis by using the inverse variances of the ratings as weights. w = 1./var(ratings); [wcoeff,score,latent,tsquared,explained] = pca(ratings,... 'VariableWeights',w);

Or equivalently: [wcoeff,score,latent,tsquared,explained] = pca(ratings,... 'VariableWeights','variance');

The following sections explain the five outputs of pca. 15-70

Analyze Quality of Life in U.S. Cities Using PCA

Component coefficients. The first output, wcoeff, contains the coefficients of the principal components. The first three principal component coefficient vectors are: c3 = wcoeff(:,1:3) c3 = wcoeff(:,1:3) c3 = 1.0e+03 * 0.0249 0.8504 0.4616 0.1005 0.5096 0.0883 2.1496 0.2649 0.1469

-0.0263 -0.5978 0.3004 -0.1269 0.2606 0.1551 0.9043 -0.3106 -0.5111

-0.0834 -0.4965 -0.0073 0.0661 0.2124 0.0737 -0.1229 -0.0411 0.6586

These coefficients are weighted, hence the coefficient matrix is not orthonormal. Transform coefficients. Transform the coefficients so that they are orthonormal. coefforth = inv(diag(std(ratings)))*wcoeff;

Note that if you use a weights vector, w, while conducting the pca, then coefforth = diag(sqrt(w))*wcoeff;

Check coefficients are orthonormal. The transformed coefficients are now orthonormal. I = coefforth'*coefforth; I(1:3,1:3) ans = 1.0000 -0.0000 -0.0000

-0.0000 1.0000 -0.0000

-0.0000 -0.0000 1.0000

Component scores. The second output, score, contains the coordinates of the original data in the new coordinate system defined by the principal components. The score matrix is the same size as the input data matrix. You can also obtain the component scores using the orthonormal coefficients and the standardized ratings as follows. cscores = zscore(ratings)*coefforth;

cscores and score are identical matrices. 15-71

15

Multivariate Methods

Plot component scores. Create a plot of the first two columns of score. figure() plot(score(:,1),score(:,2),'+') xlabel('1st Principal Component') ylabel('2nd Principal Component')

This plot shows the centered and scaled ratings data projected onto the first two principal components. pca computes the scores to have mean zero. Explore plot interactively. Note the outlying points in the right half of the plot. You can graphically identify these points as follows. gname

Move your cursor over the plot and click once near the rightmost seven points. This labels the points by their row numbers as in the following figure.

15-72

Analyze Quality of Life in U.S. Cities Using PCA

After labeling points, press Return. Extract observation names. Create an index variable containing the row numbers of all the cities you chose and get the names of the cities. metro = [43 65 179 213 234 270 314]; names(metro,:) ans = Boston, MA Chicago, IL Los Angeles, Long Beach, CA New York, NY Philadelphia, PA-NJ San Francisco, CA Washington, DC-MD-VA

These labeled cities are some of the biggest population centers in the United States and they appear more extreme than the remainder of the data. Component variances. The third output, latent, is a vector containing the variance explained by the corresponding principal component. Each column of score has a sample variance equal to the corresponding row of latent. 15-73

15

Multivariate Methods

latent latent = 3.4083 1.2140 1.1415 0.9209 0.7533 0.6306 0.4930 0.3180 0.1204

Percent variance explained. The fifth output, explained, is a vector containing the percent variance explained by the corresponding principal component. explained explained = 37.8699 13.4886 12.6831 10.2324 8.3698 7.0062 5.4783 3.5338 1.3378

Create scree plot. Make a scree plot of the percent variability explained by each principal component. figure() pareto(explained) xlabel('Principal Component') ylabel('Variance Explained (%)')

15-74

Analyze Quality of Life in U.S. Cities Using PCA

This scree plot only shows the first seven (instead of the total nine) components that explain 95% of the total variance. The only clear break in the amount of variance accounted for by each component is between the first and second components. However, the first component by itself explains less than 40% of the variance, so more components might be needed. You can see that the first three principal components explain roughly two-thirds of the total variability in the standardized ratings, so that might be a reasonable way to reduce the dimensions. Hotelling's T-squared statistic. The last output from pca is tsquared, which is Hotelling's T2, a statistical measure of the multivariate distance of each observation from the center of the data set. This is an analytical way to find the most extreme points in the data. [st2,index] = sort(tsquared,'descend'); % sort in descending order extreme = index(1); names(extreme,:) ans = New York, NY

The ratings for New York are the furthest from the average U.S. city. Visualize the results. Visualize both the orthonormal principal component coefficients for each variable and the principal component scores for each observation in a single plot. 15-75

15

Multivariate Methods

biplot(coefforth(:,1:2),'Scores',score(:,1:2),'Varlabels',categories); axis([-.26 0.6 -.51 .51]);

All nine variables are represented in this bi-plot by a vector, and the direction and length of the vector indicate how each variable contributes to the two principal components in the plot. For example, the first principal component, on the horizontal axis, has positive coefficients for all nine variables. That is why the nine vectors are directed into the right half of the plot. The largest coefficients in the first principal component are the third and seventh elements, corresponding to the variables health and arts. The second principal component, on the vertical axis, has positive coefficients for the variables education, health, arts, and transportation, and negative coefficients for the remaining five variables. This indicates that the second component distinguishes among cities that have high values for the first set of variables and low for the second, and cities that have the opposite. The variable labels in this figure are somewhat crowded. You can either exclude the VarLabels name-value pair argument when making the plot, or select and drag some of the labels to better positions using the Edit Plot tool from the figure window toolbar. This 2-D bi-plot also includes a point for each of the 329 observations, with coordinates indicating the score of each observation for the two principal components in the plot. For example, points near the left edge of this plot have the lowest scores for the first principal component. The points are scaled with respect to the maximum score value and maximum coefficient length, so only their relative locations can be determined from the plot.

15-76

Analyze Quality of Life in U.S. Cities Using PCA

You can identify items in the plot by selecting Tools>Data Cursor in the figure window. By clicking a variable (vector), you can read the variable label and coefficients for each principal component. By clicking an observation (point), you can read the observation name and scores for each principal component. You can specify 'Obslabels',names to show the observation names instead of the observation numbers in the data cursor display. Create a three-dimensional bi-plot. You can also make a bi-plot in three dimensions. figure() biplot(coefforth(:,1:3),'Scores',score(:,1:3),'Obslabels',names); axis([-.26 0.8 -.51 .51 -.61 .81]); view([30 40]);

This graph is useful if the first two principal coordinates do not explain enough of the variance in your data. You can also rotate the figure to see it from different angles by selecting theTools> Rotate 3D.

See Also biplot | boxplot | pca | pcacov | pcares | ppca

More About •

“Principal Component Analysis (PCA)” on page 15-68 15-77

15

Multivariate Methods

Factor Analysis Multivariate data often includes a large number of measured variables, and sometimes those variables overlap, in the sense that groups of them might be dependent. For example, in a decathlon, each athlete competes in 10 events, but several of them can be thought of as speed events, while others can be thought of as strength events, etc. Thus, you can think of a competitor's 10 event scores as largely dependent on a smaller set of three or four types of athletic ability. Factor analysis is a way to fit a model to multivariate data to estimate just this sort of interdependence. In a factor analysis model, the measured variables depend on a smaller number of unobserved (latent) factors. Because each factor might affect several variables in common, they are known as common factors. Each variable is assumed to be dependent on a linear combination of the common factors, and the coefficients are known as loadings. Each measured variable also includes a component due to independent random variability, known as specific variance because it is specific to one variable. Specifically, factor analysis assumes that the covariance matrix of your data is of the form

∑x = ΛΛΤ + Ψ where Λ is the matrix of loadings, and the elements of the diagonal matrix Ψ are the specific variances. The function factoran fits the Factor Analysis model using maximum likelihood.

See Also factoran

Related Examples •

15-78

“Analyze Stock Prices Using Factor Analysis” on page 15-79

Analyze Stock Prices Using Factor Analysis

Analyze Stock Prices Using Factor Analysis This example shows how to analyze if companies within the same sector experience similar week-toweek changes in stock price. Factor Loadings Load the sample data. load stockreturns

Suppose that over the course of 100 weeks, the percent change in stock prices for ten companies has been recorded. Of the ten companies, the first four can be classified as primarily technology, the next three as financial, and the last three as retail. It seems reasonable that the stock prices for companies that are in the same sector might vary together as economic conditions change. Factor analysis can provide quantitative evidence. First specify a model fit with three common factors. By default, factoran computes rotated estimates of the loadings to try and make their interpretation simpler. But in this example, specify an unrotated solution. [Loadings,specificVar,T,stats] = factoran(stocks,3,'rotate','none');

The first two factoran output arguments are the estimated loadings and the estimated specific variances. Each row of the loadings matrix represents one of the ten stocks, and each column corresponds to a common factor. With unrotated estimates, interpretation of the factors in this fit is difficult because most of the stocks contain fairly large coefficients for two or more factors. Loadings Loadings = 10×3 0.8885 0.7126 0.3351 0.3088 0.6277 0.4726 0.1133 0.6403 0.2363 0.1105

0.2367 0.3862 0.2784 0.1113 -0.6643 -0.6383 -0.5416 0.1669 0.5293 0.1680

-0.2354 0.0034 -0.0211 -0.1905 0.1478 0.0133 0.0322 0.4960 0.5770 0.5524

Factor rotation helps to simplify the structure in the Loadings matrix, to make it easier to assign meaningful interpretations to the factors. From the estimated specific variances, you can see that the model indicates that a particular stock price varies quite a lot beyond the variation due to the common factors. Display estimated specific variances. specificVar specificVar = 10×1 0.0991 0.3431

15-79

15

Multivariate Methods

0.8097 0.8559 0.1429 0.3691 0.6928 0.3162 0.3311 0.6544

A specific variance of 1 would indicate that there is no common factor component in that variable, while a specific variance of 0 would indicate that the variable is entirely determined by common factors. These data seem to fall somewhere in between. Display the p-value. stats.p ans = 0.8144

The p-value returned in the stats structure fails to reject the null hypothesis of three common factors, suggesting that this model provides a satisfactory explanation of the covariation in these data. Fit a model with two common factors to determine whether fewer than three factors can provide an acceptable fit. [Loadings2,specificVar2,T2,stats2] = factoran(stocks, 2,'rotate','none');

Display the p-value. stats2.p ans = 3.5610e-06

The p-value for this second fit is highly significant, and rejects the hypothesis of two factors, indicating that the simpler model is not sufficient to explain the pattern in these data. Factor Rotation As the results illustrate, the estimated loadings from an unrotated factor analysis fit can have a complicated structure. The goal of factor rotation is to find a parameterization in which each variable has only a small number of large loadings. That is, each variable is affected by a small number of factors, preferably only one. This can often make it easier to interpret what the factors represent. If you think of each row of the loadings matrix as coordinates of a point in M-dimensional space, then each factor corresponds to a coordinate axis. Factor rotation is equivalent to rotating those axes and computing new loadings in the rotated coordinate system. There are various ways to do this. Some methods leave the axes orthogonal, while others are oblique methods that change the angles between them. For this example, you can rotate the estimated loadings by using the promax criterion, a common oblique method. [LoadingsPM,specVarPM] = factoran(stocks,3,'rotate','promax'); LoadingsPM LoadingsPM = 10×3

15-80

Analyze Stock Prices Using Factor Analysis

0.9452 0.7064 0.3885 0.4162 0.1021 0.0873 -0.1616 0.2169 0.0016 -0.2289

0.1214 -0.0178 -0.0994 -0.0148 0.9019 0.7709 0.5320 0.2844 -0.1881 0.0636

-0.0617 0.2058 0.0975 -0.1298 0.0768 -0.0821 -0.0888 0.6635 0.7849 0.6475

Promax rotation creates a simpler structure in the loadings, one in which most of the stocks have a large loading on only one factor. To see this structure more clearly, you can use the biplot function to plot each stock using its factor loadings as coordinates. biplot(LoadingsPM,'varlabels',num2str((1:10)')); axis square view(155,27);

This plot shows that promax has rotated the factor loadings to a simpler structure. Each stock depends primarily on only one factor, and it is possible to describe each factor in terms of the stocks that it affects. Based on which companies are near which axes, you could reasonably conclude that the first factor axis represents the financial sector, the second retail, and the third technology. The original conjecture, that stocks vary primarily within sector, is apparently supported by the data.

15-81

15

Multivariate Methods

Factor Scores Sometimes, it is useful to be able to classify an observation based on its factor scores. For example, if you accepted the three-factor model and the interpretation of the rotated factors, you might want to categorize each week in terms of how favorable it was for each of the three stock sectors, based on the data from the 10 observed stocks. Because the data in this example are the raw stock price changes, and not just their correlation matrix, you can have factoran return estimates of the value of each of the three rotated common factors for each week. You can then plot the estimated scores to see how the different stock sectors were affected during each week. [LoadingsPM,specVarPM,TPM,stats,F] = factoran(stocks, 3,'rotate','promax'); plot3(F(:,1),F(:,2),F(:,3),'b.') line([-4 4 NaN 0 0 NaN 0 0], [0 0 NaN -4 4 NaN 0 0],[0 0 NaN 0 0 NaN -4 4], 'Color','black') xlabel('Financial Sector') ylabel('Retail Sector') zlabel('Technology Sector') grid on axis square view(-22.5, 8)

Oblique rotation often creates factors that are correlated. This plot shows some evidence of correlation between the first and third factors, and you can investigate further by computing the estimated factor correlation matrix. inv(TPM'*TPM);

15-82

Analyze Stock Prices Using Factor Analysis

Visualize the Results You can use the biplot function to help visualize both the factor loadings for each variable and the factor scores for each observation in a single plot. For example, the following command plots the results from the factor analysis on the stock data and labels each of the 10 stocks. biplot(LoadingsPM,'scores',F,'varlabels',num2str((1:10)')) xlabel('Financial Sector') ylabel('Retail Sector') zlabel('Technology Sector') axis square view(155,27)

In this case, the factor analysis includes three factors, and so the biplot is three-dimensional. Each of the 10 stocks is represented in this plot by a vector, and the direction and length of the vector indicates how each stock depends on the underlying factors. For example, you have seen that after promax rotation, the first four stocks have positive loadings on the first factor, and unimportant loadings on the other two factors. That first factor, interpreted as a financial sector effect, is represented in this biplot as one of the horizontal axes. The dependence of those four stocks on that factor corresponds to the four vectors directed approximately along that axis. Similarly, the dependence of stocks 5, 6, and 7 primarily on the second factor, interpreted as a retail sector effect, is represented by vectors directed approximately along that axis. Each of the 100 observations is represented in this plot by a point, and their locations indicate the score of each observation for the three factors. For example, points near the top of this plot have the

15-83

15

Multivariate Methods

highest scores for the technology sector factor. The points are scaled to fit within the unit square, so only their relative locations can be determined from the plot. You can use the Data Cursor tool from the Tools menu in the figure window to identify the items in this plot. By clicking a stock (vector), you can read off that stock's loadings for each factor. By clicking an observation (point), you can read off that observation's scores for each factor.

15-84

Robust Feature Selection Using NCA for Regression

Robust Feature Selection Using NCA for Regression Perform feature selection that is robust to outliers using a custom robust loss function in NCA. Generate data with outliers Generate sample data for regression where the response depends on three of the predictors, namely predictors 4, 7, and 13. rng(123,'twister') % For reproducibility n = 200; X = randn(n,20); y = cos(X(:,7)) + sin(X(:,4).*X(:,13)) + 0.1*randn(n,1);

Add outliers to data. numoutliers = 25; outlieridx = floor(linspace(10,90,numoutliers)); y(outlieridx) = 5*randn(numoutliers,1);

Plot the data. figure plot(y)

15-85

15

Multivariate Methods

Use non-robust loss function The performance of the feature selection algorithm highly depends on the value of the regularization parameter. A good practice is to tune the regularization parameter for the best value to use in feature selection. Tune the regularization parameter using five-fold cross validation. Use the mean squared error (MSE): n

MSE =

1 yi − y j ni∑ =1

2

First, partition the data into five folds. In each fold, the software uses 4/5th of the data for training and 1/5th of the data for validation (testing). cvp = cvpartition(length(y),'kfold',5); numtestsets = cvp.NumTestSets;

Compute the lambda values to test for and create an array to store the loss values. lambdavals = linspace(0,3,50)*std(y)/length(y); lossvals = zeros(length(lambdavals),numtestsets);

Perform NCA and compute the loss for each λ value and each fold. for i = 1:length(lambdavals) for k = 1:numtestsets Xtrain = X(cvp.training(k),:); ytrain = y(cvp.training(k),:); Xtest = X(cvp.test(k),:); ytest = y(cvp.test(k),:); nca = fsrnca(Xtrain,ytrain,'FitMethod','exact', ... 'Solver','lbfgs','Verbose',0,'Lambda',lambdavals(i), ... 'LossFunction','mse'); lossvals(i,k) = loss(nca,Xtest,ytest,'LossFunction','mse'); end end

Plot the mean loss corresponding to each lambda value. figure meanloss = mean(lossvals,2); plot(lambdavals,meanloss,'ro-') xlabel('Lambda') ylabel('Loss (MSE)') grid on

15-86

Robust Feature Selection Using NCA for Regression

Find the λ value that produces the minimum average loss. [~,idx] = min(mean(lossvals,2)); bestlambda = lambdavals(idx) bestlambda = 0.0231

Perform feature selection using the best λ value and MSE. nca = fsrnca(X,y,'FitMethod','exact','Solver','lbfgs', ... 'Verbose',1,'Lambda',bestlambda,'LossFunction','mse'); o Solver = LBFGS, HessianHistorySize = 15, LineSearchMethod = weakwolfe

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 0 | 6.414642e+00 | 8.430e-01 | 0.000e+00 | | 7.117e-01 | 0.000e+00 | Y | 1 | 6.066100e+00 | 9.952e-01 | 1.264e+00 | OK | 3.741e-01 | 1.000e+00 | Y | 2 | 5.498221e+00 | 4.267e-01 | 4.250e-01 | OK | 4.016e-01 | 1.000e+00 | Y | 3 | 5.108548e+00 | 3.933e-01 | 8.564e-01 | OK | 3.599e-01 | 1.000e+00 | Y | 4 | 4.808456e+00 | 2.505e-01 | 9.352e-01 | OK | 8.798e-01 | 1.000e+00 | Y | 5 | 4.677382e+00 | 2.085e-01 | 6.014e-01 | OK | 1.052e+00 | 1.000e+00 | Y | 6 | 4.487789e+00 | 4.726e-01 | 7.374e-01 | OK | 5.593e-01 | 1.000e+00 | Y | 7 | 4.310099e+00 | 2.484e-01 | 4.253e-01 | OK | 3.367e-01 | 1.000e+00 | Y | 8 | 4.258539e+00 | 3.629e-01 | 4.521e-01 | OK | 4.705e-01 | 5.000e-01 | Y | 9 | 4.175345e+00 | 1.972e-01 | 2.608e-01 | OK | 4.018e-01 | 1.000e+00 | Y | 10 | 4.122340e+00 | 9.169e-02 | 2.947e-01 | OK | 3.487e-01 | 1.000e+00 | Y

15-87

15

Multivariate Methods

| | | | | | | | |

11 12 13 14 15 16 17 18 19

| | | | | | | | |

4.095525e+00 4.059690e+00 4.029208e+00 4.016358e+00 4.004521e+00 3.986929e+00 3.976342e+00 3.966646e+00 3.959586e+00

| | | | | | | | |

9.798e-02 1.584e-01 7.411e-02 1.068e-01 5.434e-02 6.158e-02 4.966e-02 5.458e-02 1.046e-01

| | | | | | | | |

2.529e-01 5.213e-01 2.076e-01 2.696e-01 1.136e-01 2.993e-01 2.213e-01 2.529e-01 4.169e-01

| | | | | | | | |

OK OK OK OK OK OK OK OK OK

| | | | | | | | |

1.188e+00 9.930e-01 4.886e-01 6.919e-01 5.647e-01 1.353e+00 7.668e-01 1.988e+00 1.858e+00

| | | | | | | | |

1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00

| | | | | | | | |

Y Y Y Y Y Y Y Y Y

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 20 | 3.953759e+00 | 8.248e-02 | 2.892e-01 | OK | 1.040e+00 | 1.000e+00 | Y | 21 | 3.945475e+00 | 3.119e-02 | 1.698e-01 | OK | 1.095e+00 | 1.000e+00 | Y | 22 | 3.941567e+00 | 2.350e-02 | 1.293e-01 | OK | 1.117e+00 | 1.000e+00 | Y | 23 | 3.939468e+00 | 1.296e-02 | 1.805e-01 | OK | 2.287e+00 | 1.000e+00 | Y | 24 | 3.938662e+00 | 8.591e-03 | 5.955e-02 | OK | 1.553e+00 | 1.000e+00 | Y | 25 | 3.938239e+00 | 6.421e-03 | 5.334e-02 | OK | 1.102e+00 | 1.000e+00 | Y | 26 | 3.938013e+00 | 5.449e-03 | 6.773e-02 | OK | 2.085e+00 | 1.000e+00 | Y | 27 | 3.937896e+00 | 6.226e-03 | 3.368e-02 | OK | 7.541e-01 | 1.000e+00 | Y | 28 | 3.937820e+00 | 2.497e-03 | 2.397e-02 | OK | 7.940e-01 | 1.000e+00 | Y | 29 | 3.937791e+00 | 2.004e-03 | 1.339e-02 | OK | 1.863e+00 | 1.000e+00 | Y | 30 | 3.937784e+00 | 2.448e-03 | 1.265e-02 | OK | 9.667e-01 | 1.000e+00 | Y | 31 | 3.937778e+00 | 6.973e-04 | 2.906e-03 | OK | 4.672e-01 | 1.000e+00 | Y | 32 | 3.937778e+00 | 3.038e-04 | 9.502e-04 | OK | 1.060e+00 | 1.000e+00 | Y | 33 | 3.937777e+00 | 2.327e-04 | 1.069e-03 | OK | 1.597e+00 | 1.000e+00 | Y | 34 | 3.937777e+00 | 1.959e-04 | 1.537e-03 | OK | 4.026e+00 | 1.000e+00 | Y | 35 | 3.937777e+00 | 1.162e-04 | 1.464e-03 | OK | 3.418e+00 | 1.000e+00 | Y | 36 | 3.937777e+00 | 8.353e-05 | 3.660e-04 | OK | 7.304e-01 | 5.000e-01 | Y | 37 | 3.937777e+00 | 1.412e-05 | 1.412e-04 | OK | 7.842e-01 | 1.000e+00 | Y | 38 | 3.937777e+00 | 1.277e-05 | 3.808e-05 | OK | 1.021e+00 | 1.000e+00 | Y | 39 | 3.937777e+00 | 8.614e-06 | 3.698e-05 | OK | 2.561e+00 | 1.000e+00 | Y

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 40 | 3.937777e+00 | 3.159e-06 | 5.299e-05 | OK | 4.331e+00 | 1.000e+00 | Y | 41 | 3.937777e+00 | 2.657e-06 | 1.080e-05 | OK | 7.038e-01 | 5.000e-01 | Y | 42 | 3.937777e+00 | 7.054e-07 | 7.036e-06 | OK | 9.519e-01 | 1.000e+00 | Y Infinity norm of the final gradient = 7.054e-07 Two norm of the final step = 7.036e-06, TolX = 1.000e-06 Relative infinity norm of the final gradient = 7.054e-07, TolFun = 1.000e-06 EXIT: Local minimum found.

Plot selected features. figure plot(nca.FeatureWeights,'ro') grid on xlabel('Feature index') ylabel('Feature weight')

15-88

Robust Feature Selection Using NCA for Regression

Predict the response values using the nca model and plot the fitted (predicted) response values and the actual response values. figure fitted = predict(nca,X); plot(y,'r.') hold on plot(fitted,'b-') xlabel('index') ylabel('Fitted values')

15-89

15

Multivariate Methods

fsrnca tries to fit every point in data including the outliers. As a result it assigns nonzero weights to many features besides predictors 4, 7, and 13. Use built-in robust loss function Repeat the same process of tuning the regularization parameter, this time using the built-in ϵinsensitive loss function: l yi, y j = max 0, yi − y j − ϵ

ϵ-insensitive loss function is more robust to outliers than mean squared error. lambdavals = linspace(0,3,50)*std(y)/length(y); cvp = cvpartition(length(y),'kfold',5); numtestsets = cvp.NumTestSets; lossvals = zeros(length(lambdavals),numtestsets); for i = 1:length(lambdavals) for k = 1:numtestsets Xtrain = X(cvp.training(k),:); ytrain = y(cvp.training(k),:); Xtest = X(cvp.test(k),:); ytest = y(cvp.test(k),:); nca = fsrnca(Xtrain,ytrain,'FitMethod','exact', ... 'Solver','sgd','Verbose',0,'Lambda',lambdavals(i), ... 'LossFunction','epsiloninsensitive','Epsilon',0.8);

15-90

Robust Feature Selection Using NCA for Regression

lossvals(i,k) = loss(nca,Xtest,ytest,'LossFunction','mse'); end end

The ϵ value to use depends on the data and the best value can be determined using cross-validation as well. But choosing the ϵ value is out of scope of this example. The choice of ϵ in this example is mainly for illustrating the robustness of the method. Plot the mean loss corresponding to each lambda value. figure meanloss = mean(lossvals,2); plot(lambdavals,meanloss,'ro-') xlabel('Lambda') ylabel('Loss (MSE)') grid on

Find the lambda value that produces the minimum average loss. [~,idx] = min(mean(lossvals,2)); bestlambda = lambdavals(idx) bestlambda = 0.0187

Fit neighborhood component analysis model using ϵ-insensitive loss function and best lambda value.

15-91

15

Multivariate Methods

nca = fsrnca(X,y,'FitMethod','exact','Solver','sgd', ... 'Lambda',bestlambda,'LossFunction','epsiloninsensitive','Epsilon',0.8);

Plot selected features. figure plot(nca.FeatureWeights,'ro') grid on xlabel('Feature index') ylabel('Feature weight')

Plot fitted values. figure fitted = predict(nca,X); plot(y,'r.') hold on plot(fitted,'b-') xlabel('index') ylabel('Fitted values')

15-92

Robust Feature Selection Using NCA for Regression

ϵ-insensitive loss seems more robust to outliers. It identified fewer features than mse as relevant. The fit shows that it is still impacted by some of the outliers. Use custom robust loss function Define a custom robust loss function that is robust to outliers to use in feature selection for regression: f (yi, y j) = 1 − exp( − yi, y j ) customlossFcn = @(yi,yj) 1 - exp(-abs(bsxfun(@minus,yi,yj')));

Tune the regularization parameter using the custom-defined robust loss function. lambdavals = linspace(0,3,50)*std(y)/length(y); cvp = cvpartition(length(y),'kfold',5); numtestsets = cvp.NumTestSets; lossvals = zeros(length(lambdavals),numtestsets); for i = 1:length(lambdavals) for k = 1:numtestsets Xtrain = X(cvp.training(k),:); ytrain = y(cvp.training(k),:); Xtest = X(cvp.test(k),:); ytest = y(cvp.test(k),:); nca = fsrnca(Xtrain,ytrain,'FitMethod','exact', ...

15-93

15

Multivariate Methods

'Solver','lbfgs','Verbose',0,'Lambda',lambdavals(i), ... 'LossFunction',customlossFcn); lossvals(i,k) = loss(nca,Xtest,ytest,'LossFunction','mse'); end end

Plot the mean loss corresponding to each lambda value. figure meanloss = mean(lossvals,2); plot(lambdavals,meanloss,'ro-') xlabel('Lambda') ylabel('Loss (MSE)') grid on

Find the λ value that produces the minimum average loss. [~,idx] = min(mean(lossvals,2)); bestlambda = lambdavals(idx) bestlambda = 0.0165

Perform feature selection using the custom robust loss function and best λ value. nca = fsrnca(X,y,'FitMethod','exact','Solver','lbfgs', ... 'Verbose',1,'Lambda',bestlambda,'LossFunction',customlossFcn);

15-94

Robust Feature Selection Using NCA for Regression

o Solver = LBFGS, HessianHistorySize = 15, LineSearchMethod = weakwolfe

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 0 | 8.610073e-01 | 4.921e-02 | 0.000e+00 | | 1.219e+01 | 0.000e+00 | Y | 1 | 6.582278e-01 | 2.328e-02 | 1.820e+00 | OK | 2.177e+01 | 1.000e+00 | Y | 2 | 5.706490e-01 | 2.241e-02 | 2.360e+00 | OK | 2.541e+01 | 1.000e+00 | Y | 3 | 5.677090e-01 | 2.666e-02 | 7.583e-01 | OK | 1.092e+01 | 1.000e+00 | Y | 4 | 5.620806e-01 | 5.524e-03 | 3.335e-01 | OK | 9.973e+00 | 1.000e+00 | Y | 5 | 5.616054e-01 | 1.428e-03 | 1.025e-01 | OK | 1.736e+01 | 1.000e+00 | Y | 6 | 5.614779e-01 | 4.446e-04 | 8.350e-02 | OK | 2.507e+01 | 1.000e+00 | Y | 7 | 5.614653e-01 | 4.118e-04 | 2.466e-02 | OK | 2.105e+01 | 1.000e+00 | Y | 8 | 5.614620e-01 | 1.307e-04 | 1.373e-02 | OK | 2.002e+01 | 1.000e+00 | Y | 9 | 5.614615e-01 | 9.318e-05 | 4.128e-03 | OK | 3.683e+01 | 1.000e+00 | Y | 10 | 5.614611e-01 | 4.579e-05 | 8.785e-03 | OK | 6.170e+01 | 1.000e+00 | Y | 11 | 5.614610e-01 | 1.232e-05 | 1.582e-03 | OK | 2.000e+01 | 5.000e-01 | Y | 12 | 5.614610e-01 | 3.174e-06 | 4.742e-04 | OK | 2.510e+01 | 1.000e+00 | Y | 13 | 5.614610e-01 | 7.896e-07 | 1.683e-04 | OK | 2.959e+01 | 1.000e+00 | Y Infinity norm of the final gradient = 7.896e-07 Two norm of the final step = 1.683e-04, TolX = 1.000e-06 Relative infinity norm of the final gradient = 7.896e-07, TolFun = 1.000e-06 EXIT: Local minimum found.

Plot selected features. figure plot(nca.FeatureWeights,'ro') grid on xlabel('Feature index') ylabel('Feature weight')

15-95

15

Multivariate Methods

Plot fitted values. figure fitted = predict(nca,X); plot(y,'r.') hold on plot(fitted,'b-') xlabel('index') ylabel('Fitted values')

15-96

Robust Feature Selection Using NCA for Regression

In this case, the loss is not affected by the outliers and results are based on most of the observation values. fsrnca detects the predictors 4, 7, and 13 as relevant features and does not select any other features. Why does the loss function choice affect the results? First, compute the loss functions for a series of values for the difference between two observations. deltay = linspace(-10,10,1000)';

Compute custom loss function values. customlossvals = customlossFcn(deltay,0);

Compute epsilon insensitive loss function and values. epsinsensitive = @(yi,yj,E) max(0,abs(bsxfun(@minus,yi,yj'))-E); epsinsenvals = epsinsensitive(deltay,0,0.5);

Compute MSE loss function and values. mse = @(yi,yj) (yi-yj').^2; msevals = mse(deltay,0);

Now, plot the loss functions to see their difference and why they affect the results in the way they do. figure plot(deltay,customlossvals,'g-',deltay,epsinsenvals,'b-',deltay,msevals,'r-')

15-97

15

Multivariate Methods

xlabel('(yi - yj)') ylabel('loss(yi,yj)') legend('customloss','epsiloninsensitive','mse') ylim([0 20])

As the difference between two response values increases, mse increases quadratically, which makes it very sensitive to outliers. As fsrnca tries to minimize this loss, it ends up identifying more features as relevant. The epsilon insensitive loss is more resistant to outliers than mse, but eventually it does start to increase linearly as the difference between two observations increase. As the difference between two observations increase, the robust loss function does approach 1 and stays at that value even though the difference between the observations keeps increasing. Out of three, it is the most robust to outliers.

See Also FeatureSelectionNCARegression | fsrnca | loss | predict | refit

More About

15-98



“Neighborhood Component Analysis (NCA) Feature Selection” on page 15-99



“Introduction to Feature Selection” on page 15-49

Neighborhood Component Analysis (NCA) Feature Selection

Neighborhood Component Analysis (NCA) Feature Selection In this section... “NCA Feature Selection for Classification” on page 15-99 “NCA Feature Selection for Regression” on page 15-101 “Impact of Standardization” on page 15-102 “Choosing the Regularization Parameter Value” on page 15-102 Neighborhood component analysis (NCA) is a non-parametric method for selecting features with the goal of maximizing prediction accuracy of regression and classification algorithms. The Statistics and Machine Learning Toolbox functions fscnca and fsrnca perform NCA feature selection with regularization to learn feature weights for minimization of an objective function that measures the average leave-one-out classification or regression loss over the training data.

NCA Feature Selection for Classification Consider a multi-class classification problem with a training set containing n observations: S=

xi, yi , i = 1, 2, …, n ,

where xi ∈ ℝp are the feature vectors, yi ∈ 1, 2, …, c are the class labels, and c is the number of classes. The aim is to learn a classifier f : ℝp prediction f x for the true label y of x.

1, 2, …, c that accepts a feature vector and makes a

Consider a randomized classifier that: • Randomly picks a point, Ref x , from S as the ‘reference point’ for x • Labels x using the label of the reference point Ref x . This scheme is similar to that of a 1-NN classifier where the reference point is chosen to be the nearest neighbor of the new point x. In NCA, the reference point is chosen randomly and all points in S have some probability of being selected as the reference point. The probability P Ref x =  x j S that point x j is picked from S as the reference point for x is higher if x j is closer to x as measured by the distance function dw, where dw(xi, x j) =

p



r=1

wr2 xir − x jr ,

and wr are the feature weights. Assume that P Ref x =  x j S ∝ k dw x, x j , where k is some kernel or a similarity function that assumes large values when dw x, x j is small. Suppose it is k z = exp −

z , σ 15-99

15

Multivariate Methods

as suggested in [1]. The reference point for x is chosen from S, so sum of P Ref x =  x j S for all j must be equal to 1. Therefore, it is possible to write k dw x, x j

P Ref x =  x j S =

n



j=1

.

k dw x, x j

Now consider the leave-one-out application of this randomized classifier, that is, predicting the label −i

of xi using the data in S , the training set S excluding the point xi, yi . The probability that point x j is picked as the reference point for xi is −i

pi j = P Ref xi =  x j S

k dw xi, x j

=

n



j = 1, j ≠ i

.

k dw xi, x j

The average leave-one-out probability of correct classification is the probability pi that the −i

randomized classifier correctly classifies observation i using S . pi =

n

−i



j = 1, j ≠ i

P Ref xi = x j S

I yi = y j =

n



j = 1, j ≠ i

pi j yi j,

where 1 if yi = y j,

yi j = I yi = y j =

0 otherwise.

The average leave-one-out probability of correct classification using the randomized classifier can be written as Fw =

n

1 pi . ni∑ =1

The right hand side of F w depends on the weight vector w. The goal of neighborhood component analysis is to maximize F w with respect to w. fscnca uses the regularized objective function as introduced in [1]. Fw = =

=

n

p

1 pi − λ ∑ wr2 ni∑ =1 r=1 n

1 ni∑ =1

n



⚬j = 1, j ≠ i

pi j yi j − λ

p



r=1

wr2

,

Fi w

n

1 Fi w ni∑ =1

where λ is the regularization parameter. The regularization term drives many of the weights in w to 0. After choosing the kernel parameter σ in pi j as 1, finding the weight vector w can be expressed as the following minimization problem for given λ. 15-100

Neighborhood Component Analysis (NCA) Feature Selection

w = argmin f w = argmin w

w

n

1 fi w , ni∑ =1

where f(w) = -F(w) and fi(w) = -Fi(w). Note that n

n

1 ∑ pi j = 1, ni∑ = 1 j = 1, j ≠ i and the argument of the minimum does not change if you add a constant to an objective function. Therefore, you can rewrite the objective function by adding the constant 1. w = argmin 1 + f (w) w

= argmin w

n

n

n

n

p

1 ∑ pi j − n1 ∑ ∑ pi jyi j + λ ∑ wr2 ni∑ = 1 j = 1, j ≠ i i = 1 j = 1, j ≠ i r=1 n

n

n

n

p

= argmin

1 ∑ pi j 1 − yi j + λ ∑ wr2 ni∑ r=1 = 1 j = 1, j ≠ i

= argmin

1 ∑ pi jl(yi, y j) + λ ∑ wr2 , ni∑ = 1 j = 1, j ≠ i r=1

w

w

p

where the loss function is defined as l yi, y j =

1 if yi ≠ y j, 0 otherwise.

The argument of the minimum is the weight vector that minimizes the classification error. You can specify a custom loss function using the LossFunction name-value pair argument in the call to fscnca.

NCA Feature Selection for Regression The fsrnca function performs NCA feature selection modified for regression. Given n observations S=

xi, yi , i = 1, 2, …, n ,

the only difference from the classification problem is that the response values yi ∈ ℝ are continuous. In this case, the aim is to predict the response y given the training set S. Consider a randomized regression model that: • Randomly picks a point (Ref x ) from Sas the ‘reference point’ for x • Sets the response value at x equal to the response value of the reference point Ref x . Again, the probability P Ref x =  x j S that point x j is picked from S as the reference point for x is P Ref x =  x j S =

k dw x, x j n



j=1

k dw x, x j

.

15-101

15

Multivariate Methods

Now consider the leave-one-out application of this randomized regression model, that is, predicting −i

the response for xi using the data in S , the training set S excluding the point xi, yi . The probability that point x j is picked as the reference point for xi is −i

pi j = P Ref xi =  x j S

k dw xi, x j

=

n



j = 1, j ≠ i

.

k dw xi, x j

Let y i be the response value the randomized regression model predicts and yi be the actual response for xi. And let l: ℝ2 ℝ be a loss function that measures the disagreement between y i and yi. Then, the average value of l yi, y i is −i

li = E l yi, y i S

=

n



j = 1, j ≠ i

pi jl yi, y j .

After adding the regularization term, the objective function for minimization is: f w =

n

p

1 li + λ ∑ wr2 . ni∑ =1 r=1

The default loss function l yi, y j for NCA for regression is mean absolute deviation, but you can specify other loss functions, including a custom one, using the LossFunction name-value pair argument in the call to fsrnca.

Impact of Standardization The regularization term derives the weights of irrelevant predictors to zero. In the objective functions for NCA for classification or regression, there is only one regularization parameter λ for all weights. This fact requires the magnitudes of the weights to be comparable to each other. When the feature vectors xi in S are in different scales, this might result in weights that are in different scales and not meaningful. To avoid this situation, standardize the predictors to have zero mean and unit standard deviation before applying NCA. You can standardize the predictors using the 'Standardize',true name-value pair argument in the call to fscnca or fsrnca.

Choosing the Regularization Parameter Value It is usually necessary to select a value of the regularization parameter by calculating the accuracy of the randomized NCA classifier or regression model on an independent test set. If you use crossvalidation instead of a single test set, select the λ value that minimizes the average loss across the cross-validation folds. For examples, see “Tune Regularization Parameter to Detect Features Using NCA for Classification” and “Tune Regularization Parameter in NCA for Regression” on page 32-2269.

References [1] Yang, W., K. Wang, W. Zuo. "Neighborhood Component Feature Selection for High-Dimensional Data." Journal of Computers. Vol. 7, Number 1, January, 2012.

15-102

Neighborhood Component Analysis (NCA) Feature Selection

See Also FeatureSelectionNCAClassification | FeatureSelectionNCARegression | fscnca | fsrnca

More About •

“Robust Feature Selection Using NCA for Regression” on page 15-85



“Tune Regularization Parameter to Detect Features Using NCA for Classification”



“Introduction to Feature Selection” on page 15-49

15-103

15

Multivariate Methods

t-SNE In this section... “What Is t-SNE?” on page 15-104 “t-SNE Algorithm” on page 15-104 “Barnes-Hut Variation of t-SNE” on page 15-107 “Characteristics of t-SNE” on page 15-107

What Is t-SNE? t-SNE (tsne) is an algorithm for dimensionality reduction that is well-suited to visualizing highdimensional data. The name stands for t-distributed Stochastic Neighbor Embedding. The idea is to embed high-dimensional points in low dimensions in a way that respects similarities between points. Nearby points in the high-dimensional space correspond to nearby embedded low-dimensional points, and distant points in high-dimensional space correspond to distant embedded low-dimensional points. (Generally, it is impossible to match distances exactly between high-dimensional and low-dimensional spaces.) The tsne function creates a set of low-dimensional points from high-dimensional data. Typically, you visualize the low-dimensional points to see natural clusters in the original high-dimensional data. The algorithm takes the following general steps to embed the data in low dimensions. 1

Calculate the pairwise distances between the high-dimensional points.

2

Create a standard deviation σi for each high-dimensional point i so that the perplexity of each point is at a predetermined level. For the definition of perplexity, see “Compute Distances, Gaussian Variances, and Similarities” on page 15-105.

3

Calculate the similarity matrix. This is the joint probability distribution of X, defined by “Equation 15-2”.

4

Create an initial set of low-dimensional points.

5

Iteratively update the low-dimensional points to minimize the Kullback-Leibler divergence between a Gaussian distribution in the high-dimensional space and a t distribution in the lowdimensional space. This optimization procedure is the most time-consuming part of the algorithm.

See van der Maaten and Hinton [1].

t-SNE Algorithm The basic t-SNE algorithm performs the following steps. • “Prepare Data” on page 15-105 • “Compute Distances, Gaussian Variances, and Similarities” on page 15-105 • “Initialize the Embedding and Divergence” on page 15-106 • “Gradient Descent of Kullback-Leibler Divergence” on page 15-106 15-104

t-SNE

Prepare Data tsne first removes each row of the input data X that contains any NaN values. Then, if the Standardize name-value pair is true, tsne centers X by subtracting the mean of each column, and scales X by dividing its columns by their standard deviations. The original authors van der Maaten and Hinton [1] recommend reducing the original data X to a lower-dimensional version using “Principal Component Analysis (PCA)” on page 15-68. You can set the tsne NumPCAComponents name-value pair to the number of dimensions you like, perhaps 50. To exercise more control over this step, preprocess the data using the pca function. Compute Distances, Gaussian Variances, and Similarities After the preprocessing, tsne calculates the distance d(xi,xj) between each pair of points xi and xj in X. You can choose various distance metrics using the Distance name-value pair. By default, tsne uses the standard Euclidean metric. tsne uses the square of the distance metric in its subsequent calculations. Then for each row i of X, tsne calculates a standard deviation σi so that the perplexity of row i is equal to the Perplexity name-value pair. The perplexity is defined in terms of a model Gaussian distribution as follows. As van der Maaten and Hinton [1] describe, “The similarity of data point xj to data point xi is the conditional probability, p j i, that xi would pick xj as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at xi. For nearby data points, p j i is relatively high, whereas for widely separated data points, p j i will be almost infinitesimal (for reasonable values of the variance of the Gaussian, σi).” Define the conditional probability of j given i as 2

pj i =

exp −d xi, x j / 2σi2



k≠i

2

exp −d xi, xk / 2σi2

pi i = 0. Then define the joint probability pij by symmetrizing the conditional probabilities: pi j =

p j i + pi j , 2N

(15-2)

where N is the number of rows of X. The distributions still do not have their standard deviations σi defined in terms of the Perplexity name-value pair. Let Pi represents the conditional probability distribution over all other data points given data point xi. The perplexity of the distribution is perplexity Pi = 2

H Pi

,

where H(Pi) is the Shannon entropy of Pi: H Pi = − ∑ p j ilog2 p j j

i

.

The perplexity measures the effective number of neighbors of point i. tsne performs a binary search over the σi to achieve a fixed perplexity for each point i. 15-105

15

Multivariate Methods

Initialize the Embedding and Divergence To embed the points in X into a low-dimensional space, tsne performs an optimization. tsne attempts to minimize the Kullback-Leibler divergence between the model Gaussian distribution of the points in X and a Student t distribution of points Y in the low-dimensional space. The minimization procedure begins with an initial set of points Y. tsne create the points by default as random Gaussian-distributed points. You can also create these points yourself and include them in the 'InitialY' name-value pair for tsne. tsne then calculates the similarities between each pair of points in Y. The probability model qij of the distribution of the distances between points yi and yj is 1 + yi − y j

qi j =

∑ ∑

k l≠k

2 −1

1 + yk − yl

2 −1

qii = 0. Using this definition and the model of distances in X given by “Equation 15-2”, the Kullback-Liebler divergence between the joint distribution P and Q is KL(P

Q) =

∑∑

j i≠ j

pi jlog

pi j . qi j

For consequences of this definition, see “Helpful Nonlinear Distortion” on page 15-107. Gradient Descent of Kullback-Leibler Divergence To minimize the Kullback-Leibler divergence, the 'exact' algorithm uses a modified gradient descent procedure. The gradient with respect to the points in Y of the divergence is ∂KL(P Q) = 4 ∑ Z pi j − qi j qi j yi − y j , ∂yi j≠i where the normalization term Z=

∑ ∑

k l≠k

1 + yk − yl

2 −1

.

The modified gradient descent algorithm uses a few tuning parameters to attempt to reach a good local minimum. • 'Exaggeration' — During the first 99 gradient descent steps, tsne multiplies the probabilities pij from “Equation 15-2” by the exaggeration value. This step tends to create more space between clusters in the output Y. • 'LearnRate' — tsne uses adaptive learning to improve the convergence of the gradient descent iterations. The descent algorithm has iterative steps that are a linear combination of the previous step in the descent and the current gradient. 'LearnRate' is a multiplier of the current gradient for the linear combination. For details, see Jacobs [3].

15-106

t-SNE

Barnes-Hut Variation of t-SNE To speed the t-SNE algorithm and to cut down on its memory usage, tsne offers an approximate optimization scheme. The Barnes-Hut algorithm groups nearby points together to lower the complexity and memory usage of the t-SNE optimization step. The Barnes-Hut algorithm is an approximate optimizer, not an exact optimizer. There is a nonnegative tuning parameter Theta that effects a tradeoff between speed and accuracy. Larger values of 'Theta' give faster but less accurate optimization results. The algorithm is relatively insensitive to 'Theta' values in the range (0.2,0.8). The Barnes-Hut algorithm groups nearby points in the low-dimensional space, and performs an approximate gradient descent based on these groups. The idea, originally used in astrophysics, is that the gradient is similar for nearby points, so the computations can be simplified. See van der Maaten [2].

Characteristics of t-SNE • “Cannot Use Embedding to Classify New Data” on page 15-107 • “Performance Depends on Data Sizes and Algorithm” on page 15-107 • “Helpful Nonlinear Distortion” on page 15-107 Cannot Use Embedding to Classify New Data Because t-SNE often separates data clusters well, it can seem that t-SNE can classify new data points. However, t-SNE cannot classify new points. The t-SNE embedding is a nonlinear map that is data-dependent. To embed a new point in the low-dimensional space, you cannot use the previous embedding as a map. Instead, run the entire algorithm again. Performance Depends on Data Sizes and Algorithm t-SNE can take a good deal of time to process data. If you have N data points in D dimensions that you want to map to Y dimensions, then • Exact t-SNE takes of order D*N2 operations. • Barnes-Hut t-SNE takes of order D*Nlog(N)*exp(dimension(Y)) operations. So for large data sets, where N is greater than 1000 or so, and where the embedding dimension Y is 2 or 3, the Barnes-Hut algorithm can be faster than the exact algorithm. Helpful Nonlinear Distortion T-SNE maps high-dimensional distances to distorted low-dimensional analogues. Because of the fatter tail of the Student t distribution in the low-dimensional space, tsne often moves close points closer together, and moves far points farther apart than in the high-dimensional space, as illustrated in the following figure. The figure shows both Gaussian and Student t distributions at the points where the densities are at 0.25 and 0.025. The Gaussian density relates to high-dimensional distances, and the t density relates to low-dimensional distances. The t density corresponds to close points being closer, and far points being farther, compared to the Gaussian density. t = linspace(0,5); y1 = normpdf(t,0,1); y2 = tpdf(t,1);

15-107

15

Multivariate Methods

plot(t,y1,'k',t,y2,'r') hold on x1 = fzero(@(x)normpdf(x,0,1)-0.25,[0,2]); x2 = fzero(@(x)tpdf(x,1)-0.25,[0,2]); z1 = fzero(@(x)normpdf(x,0,1)-0.025,[0,5]); z2 = fzero(@(x)tpdf(x,1)-0.025,[0,5]); plot([0,x1],[0.25,0.25],'k-.') plot([0,z2],[0.025,0.025],'k-.') plot([x1,x1],[0,0.25],'g-',[x2,x2],[0,0.25],'g-') plot([z1,z1],[0,0.025],'g-',[z2,z2],[0,0.025],'g-') text(1.1,.25,'Close points are closer in low-D') text(2.4,.05,'Far points are farther in low-D') legend('Gaussian(0,1)','Student t (df = 1)') xlabel('x') ylabel('Density') title('Density of Gaussian(0,1) and Student t (df = 1)') hold off

This distortion is helpful when it applies. It does not apply in cases such as when the Gaussian variance is high, which lowers the Gaussian peak and flattens the distribution. In such a case, tsne can move close points farther apart than in the original space. To achieve a helpful distortion, • Set the 'Verbose' name-value pair to 2. • Adjust the 'Perplexity' name-value pair so the reported range of variances is not too far from 1, and the mean variance is near 1.

15-108

t-SNE

If you can achieve this range of variances, then the diagram applies, and the tsne distortion is helpful. For effective ways to tune tsne, see Wattenberg, Viégas and Johnson [4].

References [1] van der Maaten, Laurens, and Geoffrey Hinton. Visualizing Data using t-SNE. J. Machine Learning Research 9, 2008, pp. 2579–2605. [2] van der Maaten, Laurens. Barnes-Hut-SNE. arXiv:1301.3342 [cs.LG], 2013. [3] Jacobs, Robert A. Increased rates of convergence through learning rate adaptation. Neural Networks 1.4, 1988, pp. 295–307. [4] Wattenberg, Martin, Fernanda Viégas, and Ian Johnson. How to Use t-SNE Effectively. Distill, 2016. Available at How to Use t-SNE Effectively.

See Also Related Examples •

“Visualize High-Dimensional Data Using t-SNE” on page 15-113



“t-SNE Output Function” on page 15-110



“tsne Settings” on page 15-117

More About •

“Principal Component Analysis (PCA)” on page 15-68



“Classical Multidimensional Scaling” on page 15-40



“Factor Analysis” on page 15-78

15-109

15

Multivariate Methods

t-SNE Output Function In this section... “t-SNE Output Function Description” on page 15-110 “tsne optimValues Structure” on page 15-110 “t-SNE Custom Output Function” on page 15-111

t-SNE Output Function Description A tsne output function is a function that runs after every NumPrint optimization iterations of the tSNE algorithm. An output function can create plots, or log data to a file or to a workspace variable. The function cannot change the progress of the algorithm, but can halt the iterations. Set output functions using the Options name-value pair argument to the tsne function. Set Options to a structure created using statset or struct. Set the 'OutputFcn' field of the Options structure to a function handle or cell array of function handles. For example, to set an output function named outfun.m, use the following commands. opts = statset('OutputFcn',@outfun); Y = tsne(X,'Options',opts);

Write an output function using the following syntax. function stop = outfun(optimValues,state) stop = false; % do not stop by default switch state case 'init' % Set up plots or open files case 'iter' % Draw plots or update variables case 'done' % Clean up plots or files end

tsne passes the state and optimValues variables to your function. state takes on the values 'init', 'iter', or 'done' as shown in the code snippet.

tsne optimValues Structure

15-110

optimValues Field

Description

'iteration'

Iteration number

'fval'

Kullback-Leibler divergence, modified by exaggeration during the first 99 iterations

'grad'

Gradient of the Kullback-Leibler divergence, modified by exaggeration during the first 99 iterations

'Exaggeration'

Value of the exaggeration parameter in use in the current iteration

'Y'

Current embedding

t-SNE Output Function

t-SNE Custom Output Function This example shows how to use an output function in tsne. Custom Output Function The following code is an output function that performs these tasks: • Keep a history of the Kullback-Leibler divergence and the norm of its gradient in a workspace variable. • Plot the solution and the history as the iterations proceed. • Display a Stop button on the plot to stop the iterations early without losing any information. The output function has an extra input variable, species, that enables its plots to show the correct classification of the data. For information on including extra parameters such as species in a function, see “Parameterizing Functions” (MATLAB). function stop = KLLogging(optimValues,state,species) persistent h kllog iters stopnow switch state case 'init' stopnow = false; kllog = []; iters = []; h = figure; c = uicontrol('Style','pushbutton','String','Stop','Position', ... [10 10 50 20],'Callback',@stopme); case 'iter' kllog = [kllog; optimValues.fval,log(norm(optimValues.grad))]; assignin('base','history',kllog) iters = [iters; optimValues.iteration]; if length(iters) > 1 figure(h) subplot(2,1,2) plot(iters,kllog); xlabel('Iterations') ylabel('Loss and Gradient') legend('Divergence','log(norm(gradient))') title('Divergence and log(norm(gradient))') subplot(2,1,1) gscatter(optimValues.Y(:,1),optimValues.Y(:,2),species) title('Embedding') drawnow end case 'done' % Nothing here end stop = stopnow; function stopme(~,~) stopnow = true; end end

15-111

15

Multivariate Methods

Use the Custom Output Function Plot the Fisher iris data, a 4-D data set, in two dimensions using tsne. There is a drop in the Divergence value at iteration 100 because the divergence is scaled by the exaggeration value for earlier iterations. The embedding remains largely unchanged for the last several hundred iterations, so you can save time by clicking the Stop button during the iterations. load fisheriris rng default % for reproducibility opts = statset('OutputFcn',@(optimValues,state) KLLogging(optimValues,state,species)); Y = tsne(meas,'Options',opts,'Algorithm','exact');

See Also Related Examples

15-112



“Visualize High-Dimensional Data Using t-SNE” on page 15-113



“tsne Settings” on page 15-117

Visualize High-Dimensional Data Using t-SNE

Visualize High-Dimensional Data Using t-SNE This example shows how to visualize the MNIST data [1], which consists of images of handwritten digits, using the tsne function. The images are 28-by-28 pixels in grayscale. Each image has an associated label from 0 through 9, which is the digit that the image represents. tsne reduces the dimension of the data from 784 original dimensions to 50 using PCA, and then to two or three using the t-SNE Barnes-Hut algorithm. Obtain Data Begin by obtaining image and label data from http://yann.lecun.com/exdb/mnist/ Unzip the files. For this example, use the t10k-images data. imageFileName = 't10k-images.idx3-ubyte'; labelFileName = 't10k-labels.idx1-ubyte';

Process the files to load them in the workspace. The code for this processing function appears at the end of this example. [X,L] = processMNISTdata(imageFileName,labelFileName); Read MNIST image data... Number of images in the dataset: 10000 ... Each image is of 28 by 28 pixels... The image data is read to a matrix of dimensions: End of reading image data. Read MNIST label data... Number of labels in the dataset: 10000 ... The label data is read to a matrix of dimensions: End of reading label data.

10000 by

784...

10000 by

1...

Reduce Dimension of Data to Two Obtain two-dimensional analogues of the data clusters using t-SNE. Use PCA to reduce the initial dimensionality to 50. Use the Barnes-Hut variant of the t-SNE algorithm to save time on this relatively large data set. rng default % for reproducibility Y = tsne(X,'Algorithm','barneshut','NumPCAComponents',50);

Display the result, colored with the correct labels. figure gscatter(Y(:,1),Y(:,2),L)

15-113

15

Multivariate Methods

t-SNE creates clusters of points based solely on their relative similarities that correspond closely to the true labels. Reduce Dimension of Data to Three t-SNE can also reduce the data to three dimensions. Set the tsne 'NumDimensions' name-value pair to 3. rng default % for fair comparison Y3 = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'NumDimensions',3); figure scatter3(Y3(:,1),Y3(:,2),Y3(:,3),15,L,'filled'); view(-93,14)

15-114

Visualize High-Dimensional Data Using t-SNE

Here is the code of the function that reads the data into the workspace. function [X,L] = processMNISTdata(imageFileName,labelFileName) [fileID,errmsg] = fopen(imageFileName,'r','b'); if fileID < 0 error(errmsg); end %% % First read the magic number. This number is 2051 for image data, and % 2049 for label data magicNum = fread(fileID,1,'int32',0,'b'); if magicNum == 2051 fprintf('\nRead MNIST image data...\n') end %% % Then read the number of images, number of rows, and number of columns numImages = fread(fileID,1,'int32',0,'b'); fprintf('Number of images in the dataset: %6d ...\n',numImages); numRows = fread(fileID,1,'int32',0,'b'); numCols = fread(fileID,1,'int32',0,'b'); fprintf('Each image is of %2d by %2d pixels...\n',numRows,numCols); %% % Read the image data X = fread(fileID,inf,'unsigned char'); %%

15-115

15

Multivariate Methods

% Reshape the data to array X X = reshape(X,numCols,numRows,numImages); X = permute(X,[2 1 3]); %% % Then flatten each image data into a 1 by (numRows*numCols) vector, and % store all the image data into a numImages by (numRows*numCols) array. X = reshape(X,numRows*numCols,numImages)'; fprintf(['The image data is read to a matrix of dimensions: %6d by %4d...\n',... 'End of reading image data.\n'],size(X,1),size(X,2)); %% % Close the file fclose(fileID); %% % Similarly, read the label data. [fileID,errmsg] = fopen(labelFileName,'r','b'); if fileID < 0 error(errmsg); end magicNum = fread(fileID,1,'int32',0,'b'); if magicNum == 2049 fprintf('\nRead MNIST label data...\n') end numItems = fread(fileID,1,'int32',0,'b'); fprintf('Number of labels in the dataset: %6d ...\n',numItems); L = fread(fileID,inf,'unsigned char'); fprintf(['The label data is read to a matrix of dimensions: %6d by %2d...\n',... 'End of reading label data.\n'],size(L,1),size(L,2)); fclose(fileID);

References [1] Yann LeCun (Courant Institute, NYU) and Corinna Cortes (Google Labs, New York) hold the copyright of MNIST dataset, which is a derivative work from original NIST datasets. MNIST dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license, https://creativecommons.org/licenses/by-sa/3.0/

See Also Related Examples •

“tsne Settings” on page 15-117

More About •

15-116

“t-SNE” on page 15-104

tsne Settings

tsne Settings This example shows the effects of various tsne settings. Obtain Data Begin by obtaining the MNIST [1] image and label data from http://yann.lecun.com/exdb/mnist/ Unzip the files. For this example, use the t10k-images data. imageFileName = 't10k-images.idx3-ubyte'; labelFileName = 't10k-labels.idx1-ubyte';

Process the files to load them in the workspace. The code for this processing function appears at the end of this example. [X,L] = processMNISTdata(imageFileName,labelFileName); Read MNIST image data... Number of images in the dataset: 10000 ... Each image is of 28 by 28 pixels... The image data is read to a matrix of dimensions: End of reading image data. Read MNIST label data... Number of labels in the dataset: 10000 ... The label data is read to a matrix of dimensions: End of reading label data.

10000 by

784...

10000 by

1...

Process Data Using t-SNE Obtain two-dimensional analogs of the data clusters using t-SNE. Use the Barnes-Hut algorithm for better performance on this large data set. Use PCA to reduce the initial dimensions from 784 to 50. rng default % for reproducibility Y = tsne(X,'Algorithm','barneshut','NumPCAComponents',50); figure gscatter(Y(:,1),Y(:,2),L) title('Default Figure')

15-117

15

Multivariate Methods

t-SNE creates a figure with well separated clusters and relatively few data points that seem misplaced. Perplexity Try altering the perplexity setting to see the effect on the figure. rng default % for fair comparison Y100 = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'Perplexity',100); figure gscatter(Y100(:,1),Y100(:,2),L) title('Perplexity 100') rng default % for fair comparison Y4 = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'Perplexity',4); figure gscatter(Y4(:,1),Y4(:,2),L) title('Perplexity 4')

15-118

tsne Settings

15-119

15

Multivariate Methods

Setting the perplexity to 100 yields a figure that is largely similar to the default figure. The clusters are tighter than with the default setting. However, setting the perplexity to 4 gives a figure without well separated clusters. The clusters are looser than with the default setting. Exaggeration Try altering the exaggeration setting to see the effect on the figure. rng default % for fair comparison YEX0 = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'Exaggeration',20); figure gscatter(YEX0(:,1),YEX0(:,2),L) title('Exaggeration 20') rng default % for fair comparison YEx15 = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'Exaggeration',1.5); figure gscatter(YEx15(:,1),YEx15(:,2),L) title('Exaggeration 1.5')

15-120

tsne Settings

15-121

15

Multivariate Methods

While the exaggeration setting has an effect on the figure, it is not clear whether any nondefault setting gives a better picture than the default setting. The figure with an exaggeration of 20 is similar to the default figure. In general, a larger exaggeration creates more empty space between embedded clusters. An exaggeration of 1.5 causes the groups labeled 1 and 6 to split into two groups each, an undesirable outcome. Exaggerating the values in the joint distribution of X makes the values in the joint distribution of Y smaller. This makes it much easier for the embedded points to move relative to one another. The splitting of clusters 1 and 6 reflects this effect. Learning Rate Try altering the learning rate setting to see the effect on the figure. rng default % for fair comparison YL5 = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'LearnRate',5); figure gscatter(YL5(:,1),YL5(:,2),L) title('Learning Rate 5') rng default % for fair comparison YL2000 = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'LearnRate',2000); figure gscatter(YL2000(:,1),YL2000(:,2),L) title('Learning Rate 2000')

15-122

tsne Settings

15-123

15

Multivariate Methods

The figure with a learning rate of 5 has several clusters that split into two or more pieces. This shows that if the learning rate is too small, the minimization process can get stuck in a bad local minimum. A learning rate of 2000 gives a figure similar to the default figure. Initial Behavior with Various Settings Large learning rates or large exaggeration values can lead to undesirable initial behavior. To see this, set large values of these parameters and set NumPrint and Verbose to 1 to show all the iterations. Stop the iterations after 10, as the goal of this experiment is simply to look at the initial behavior. Begin by setting the exaggeration to 200. rng default % for fair comparison opts = statset('MaxIter',10); YEX200 = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'Exaggeration',200,... 'NumPrint',1,'Verbose',1,'Options',opts); |==============================================| | ITER | KL DIVERGENCE | NORM GRAD USING | | | FUN VALUE USING | EXAGGERATED DIST| | | EXAGGERATED DIST| OF X | | | OF X | | |==============================================| | 1 | 2.190347e+03 | 6.078667e-05 | | 2 | 2.190352e+03 | 4.769050e-03 | | 3 | 2.204061e+03 | 9.423678e-02 |

15-124

tsne Settings

| | | | | | |

4 5 6 7 8 9 10

| | | | | | |

2.464585e+03 2.501222e+03 2.529362e+03 2.553233e+03 2.562822e+03 2.538056e+03 2.504932e+03

| | | | | | |

2.113271e-02 2.616407e-02 3.022570e-02 3.108418e-02 3.278873e-02 3.222265e-02 3.671708e-02

| | | | | | |

The Kullback-Leibler divergence increases during the first few iterations, and the norm of the gradient increases as well. To see the final result of the embedding, allow the algorithm to run to completion using the default stopping criteria. rng default % for fair comparison YEX200 = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'Exaggeration',200); figure gscatter(YEX200(:,1),YEX200(:,2),L) title('Exaggeration 200')

This exaggeration value does not give a clean separation into clusters. Show the initial behavior when the learning rate is 100,000. rng default % for fair comparison YL100k = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'LearnRate',1e5,... 'NumPrint',1,'Verbose',1,'Options',opts);

15-125

15

Multivariate Methods

|==============================================| | ITER | KL DIVERGENCE | NORM GRAD USING | | | FUN VALUE USING | EXAGGERATED DIST| | | EXAGGERATED DIST| OF X | | | OF X | | |==============================================| | 1 | 2.815885e+01 | 1.024049e-06 | | 2 | 2.816002e+01 | 2.902059e-04 | | 3 | 3.195873e+01 | 7.355889e-04 | | 4 | 3.348151e+01 | 3.958901e-04 | | 5 | 3.365935e+01 | 2.876905e-04 | | 6 | 3.342462e+01 | 3.906245e-04 | | 7 | 3.303205e+01 | 4.037983e-04 | | 8 | 3.263320e+01 | 5.665630e-04 | | 9 | 3.235384e+01 | 4.319099e-04 | | 10 | 3.211238e+01 | 4.803526e-04 |

Again, the Kullback-Leibler divergence increases during the first few iterations, and the norm of the gradient increases as well. To see the final result of the embedding, allow the algorithm to run to completion using the default stopping criteria. rng default % for fair comparison YL100k = tsne(X,'Algorithm','barneshut','NumPCAComponents',50,'LearnRate',1e5); figure gscatter(YL100k(:,1),YL100k(:,2),L) title('Learning Rate 100,000')

15-126

tsne Settings

The learning rate is far too large, and gives no useful embedding. Conclusion tsne with default settings does a good job of embedding the high-dimensional initial data into twodimensional points that have well defined clusters. The effects of algorithm settings are difficult to predict. Sometimes they can improve the clustering, but for the most part the default settings seem good. While speed is not part of this investigation, settings can affect the speed of the algorithm. In particular, the Barnes-Hut algorithm is notably faster on this data. Code to Process MNIST Data Here is the code of the function that reads the data into the workspace. function [X,L] = processMNISTdata(imageFileName,labelFileName) [fileID,errmsg] = fopen(imageFileName,'r','b'); if fileID < 0 error(errmsg); end %% % First read the magic number. This number is 2051 for image data, and % 2049 for label data magicNum = fread(fileID,1,'int32',0,'b'); if magicNum == 2051 fprintf('\nRead MNIST image data...\n')

15-127

15

Multivariate Methods

end %% % Then read the number of images, number of rows, and number of columns numImages = fread(fileID,1,'int32',0,'b'); fprintf('Number of images in the dataset: %6d ...\n',numImages); numRows = fread(fileID,1,'int32',0,'b'); numCols = fread(fileID,1,'int32',0,'b'); fprintf('Each image is of %2d by %2d pixels...\n',numRows,numCols); %% % Read the image data X = fread(fileID,inf,'unsigned char'); %% % Reshape the data to array X X = reshape(X,numCols,numRows,numImages); X = permute(X,[2 1 3]); %% % Then flatten each image data into a 1 by (numRows*numCols) vector, and % store all the image data into a numImages by (numRows*numCols) array. X = reshape(X,numRows*numCols,numImages)'; fprintf(['The image data is read to a matrix of dimensions: %6d by %4d...\n',... 'End of reading image data.\n'],size(X,1),size(X,2)); %% % Close the file fclose(fileID); %% % Similarly, read the label data. [fileID,errmsg] = fopen(labelFileName,'r','b'); if fileID < 0 error(errmsg); end magicNum = fread(fileID,1,'int32',0,'b'); if magicNum == 2049 fprintf('\nRead MNIST label data...\n') end numItems = fread(fileID,1,'int32',0,'b'); fprintf('Number of labels in the dataset: %6d ...\n',numItems); L = fread(fileID,inf,'unsigned char'); fprintf(['The label data is read to a matrix of dimensions: %6d by %2d...\n',... 'End of reading label data.\n'],size(L,1),size(L,2)); fclose(fileID);

References [1] Yann LeCun (Courant Institute, NYU) and Corinna Cortes (Google Labs, New York) hold the copyright of MNIST dataset, which is a derivative work from original NIST datasets. MNIST dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license, https://creativecommons.org/licenses/by-sa/3.0/

See Also Related Examples •

15-128

“Visualize High-Dimensional Data Using t-SNE” on page 15-113

tsne Settings



“t-SNE Output Function” on page 15-110

More About •

“t-SNE” on page 15-104

External Websites •

How to Use t-SNE Effectively

15-129

15

Multivariate Methods

Feature Extraction In this section... “What Is Feature Extraction?” on page 15-130 “Sparse Filtering Algorithm” on page 15-130 “Reconstruction ICA Algorithm” on page 15-132

What Is Feature Extraction? Feature extraction is a set of methods that map input features to new output features. Many feature extraction methods use unsupervised learning to extract features. Unlike some feature extraction methods such as PCA and NNMF, the methods described in this section can increase dimensionality (and decrease dimensionality). Internally, the methods involve optimizing nonlinear objective functions. For details, see “Sparse Filtering Algorithm” on page 15-130 or “Reconstruction ICA Algorithm” on page 15-132. One typical use of feature extraction is finding features in images. Using these features can lead to improved classification accuracy. For an example, see “Feature Extraction Workflow” on page 15-135. Another typical use is extracting individual signals from superpositions, which is often termed blind source separation. For an example, see “Extract Mixed Signals” on page 15-164. There are two feature extraction functions: rica and sparsefilt. Associated with these functions are the objects that they create: ReconstructionICA and SparseFiltering.

Sparse Filtering Algorithm The sparse filtering algorithm begins with a data matrix X that has n rows and p columns. Each row represents one observation and each column represents one measurement. The columns are also called the features or predictors. The algorithm then takes either an initial random p-by-q weight matrix W or uses the weight matrix passed in the InitialTransformWeights name-value pair. q is the requested number of features that sparsefilt computes. The algorithm attempts to minimize the “Sparse Filtering Objective Function” on page 15-131 by using a standard limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) quasi-Newton optimizer. See Nocedal and Wright [2]. This optimizer takes up to IterationLimit iterations. It stops iterating earlier when it takes a step whose norm is less than StepTolerance, or when it computes that the norm of the gradient at the current point is less than GradientTolerance times a scalar τ, where τ = max 1, min f , g0



.

|f| is the norm of the objective function, and g0



is the infinity norm of the initial gradient.

The objective function attempts to simultaneously obtain few nonzero features for each data point, and for each resulting feature to have nearly equal weight. To understand how the objective function attempts to achieve these goals, see Ngiam, Koh, Chen, Bhaksar, and Ng [1]. Frequently, you obtain good features by setting a relatively small value of IterationLimit, from as low as 5 to a few hundred. Allowing the optimizer to continue can result in overtraining, where the extracted features do not generalize well to new data. 15-130

Feature Extraction

After constructing a SparseFiltering object, use the transform method to map input data to the new output features. Sparse Filtering Objective Function To compute an objective function, the sparse filtering algorithm uses the following steps. The objective function depends on the n-by-p data matrix X and a weight matrix W that the optimizer varies. The weight matrix W has dimensions p-by-q, where p is the number of original features and q is the number of requested features. 1

Compute the n-by-q matrix X*W. Apply the approximate absolute value function −8

ϕ u = u2 + 10 to each element of X*W to obtain the matrix F. ϕ is a smooth nonnegative symmetric function that closely approximates the absolute value function. 2

Normalize the columns of F by the approximate L2 norm. In other words, define the normalized matrix F(i, j) by n



F( j) =

F(i, j)

2

−8

+ 10

i=1

F(i, j) = F(i, j)/ F( j) . 3

Normalize the rows of F(i, j) by the approximate L2 norm. In other words, define the normalized matrix F (i, j) by q



F(i) =

F(i, j)

2

−8

+ 10

j=1

F (i, j) = F(i, j)/ F(i) . The matrix F is the matrix of converted features in X. Once sparsefilt finds the weights W that minimize the objective function h (see below), which the function stores in the output object Mdl in the Mdl.TransformWeights property, the transform function can follow the same transformation steps to convert new data to output features. 4

Compute the objective function h(W) as the 1–norm of the matrix F (i, j), meaning the sum of all the elements in the matrix (which are nonnegative by construction): hW =

q

n

∑ ∑

F (i, j) .

j = 1i = 1

5

If you set the Lambda name-value pair to a strictly positive value, sparsefilt uses the following modified objective function: hW =

q

n

∑ ∑

j = 1i = 1

F (i, j) + λ

q



j=1

wTj w j .

Here, wj is the jth column of the matrix W and λ is the value of Lambda. The effect of this term is to shrink the weights W. If you plot the columns of W as images, with positive Lambda these images appear smooth compared to the same images with zero Lambda.

15-131

15

Multivariate Methods

Reconstruction ICA Algorithm The Reconstruction Independent Component Analysis (RICA) algorithm is based on minimizing an objective function. The algorithm maps input data to output features. The ICA source model is the following. Each observation x is generated by a random vector s according to x = μ + As . • x is a column vector of length p. • μ is a column vector of length p representing a constant term. • s is a column vector of length q whose elements are zero mean, unit variance random variables that are statistically independent of each other. • A is a mixing matrix of size p-by-q. You can use this model in rica to estimate A from observations of x. See “Extract Mixed Signals” on page 15-164. The RICA algorithm begins with a data matrix X that has n rows and p columns consisting of the observations xi: x1T X=

x2T ⋮

.

xnT Each row represents one observation and each column represents one measurement. The columns are also called the features or predictors. The algorithm then takes either an initial random p-by-q weight matrix W or uses the weight matrix passed in the InitialTransformWeights name-value pair. q is the requested number of features that rica computes. The weight matrix W is composed of columns wi of size p-by-1: W = w1 w2 … wq . The algorithm attempts to minimize the “Reconstruction ICA Objective Function” on page 15-133 by using a standard limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) quasi-Newton optimizer. See Nocedal and Wright [2]. This optimizer takes up to IterationLimit iterations. It stops iterating when it takes a step whose norm is less than StepTolerance, or when it computes that the norm of the gradient at the current point is less than GradientTolerance times a scalar τ, where τ = max 1, min f , g0



.

|f| is the norm of the objective function, and g0



is the infinity norm of the initial gradient.

The objective function attempts to obtain a nearly orthonormal weight matrix that minimizes the sum of elements of g(XW), where g is a function (described below) that is applied elementwise to XW. To understand how the objective function attempts to achieve these goals, see Le, Karpenko, Ngiam, and Ng [3]. 15-132

Feature Extraction

After constructing a ReconstructionICA object, use the transform method to map input data to the new output features. Reconstruction ICA Objective Function The objective function uses a contrast function, which you specify by using the ContrastFcn namevalue pair. The contrast function is a smooth convex function that is similar to an absolute value. By 1 default, the contrast function is g = log cosh 2x . For other available contrast functions, see 2 ContrastFcn. For an n-by-p data matrix X and q output features, with a regularization parameter λ as the value of the Lambda name-value pair, the objective function in terms of the p-by-q matrix W is h=

n

λ WW T xi − xi ni∑ =1

2 2+

n

q

1 ∑ σ jg wTj xi ni∑ =1 j=1

The σj are known constants that are ±1. When σj = +1, minimizing the objective function h encourages the histogram of wTj xi to be sharply peaked at 0 (super Gaussian). When σj = –1, minimizing the objective function h encourages the histogram of wTj xi to be flatter near 0 (sub Gaussian). Specify the σj values using the rica NonGaussianityIndicator name-value pair. The objective function h can have a spurious minimum of zero when λ is zero. Therefore, rica minimizes h over W that are normalized to 1. In other words, each column wj of W is defined in terms of a column vector vj by wj =

vj −8 T v j v j + 10

.

rica minimizes over the vj. The resulting minimal matrix W provides the transformation from input data X to output features XW.

References [1] Ngiam, Jiquan, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, and Andrew Y. Ng. “Sparse Filtering.” Advances in Neural Information Processing Systems. Vol. 24, 2011, pp. 1125–1133. https://papers.nips.cc/paper/4334-sparse-filtering.pdf. [2] Nocedal, J. and S. J. Wright. Numerical Optimization, Second Edition. Springer Series in Operations Research, Springer Verlag, 2006. [3] Le, Quoc V., Alexandre Karpenko, Jiquan Ngiam, and Andrew Y. Ng. “ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning.” Advances in Neural Information Processing Systems. Vol. 24, 2011, pp. 1017–1025. https://papers.nips.cc/paper/4467-icawith-reconstruction-cost-for-efficient-overcomplete-featurelearning.pdf.

See Also ReconstructionICA | SparseFiltering | rica | sparsefilt 15-133

15

Multivariate Methods

Related Examples

15-134



“Feature Extraction Workflow” on page 15-135



“Extract Mixed Signals” on page 15-164

Feature Extraction Workflow

Feature Extraction Workflow This example shows a complete workflow for feature extraction from image data. Obtain Data This example uses the MNIST image data [1], which consists of images of handwritten digits. The images are 28-by-28 pixels in gray scale. Each image has an associated label from 0 through 9, which is the digit that the image represents. Begin by obtaining image and label data from http://yann.lecun.com/exdb/mnist/ Unzip the files. For better performance on this long example, use the test data as training data and the training data as test data. imageFileName = 't10k-images.idx3-ubyte'; labelFileName = 't10k-labels.idx1-ubyte';

Process the files to load them in the workspace. The code for this processing function appears at the end of this example. [Xtrain,LabelTrain] = processMNISTdata(imageFileName,labelFileName); Read MNIST image data... Number of images in the dataset: 10000 ... Each image is of 28 by 28 pixels... The image data is read to a matrix of dimensions: End of reading image data. Read MNIST label data... Number of labels in the dataset: 10000 ... The label data is read to a matrix of dimensions: End of reading label data.

10000 by

784...

10000 by

1...

View a few of the images. rng('default') % For reproducibility numrows = size(Xtrain,1); ims = randi(numrows,4,1); imgs = Xtrain(ims,:); for i = 1:4 pp{i} = reshape(imgs(i,:),28,28); end ppf = [pp{1},pp{2};pp{3},pp{4}]; imshow(ppf);

15-135

15

Multivariate Methods

Choose New Feature Dimensions There are several considerations in choosing the number of features to extract: • More features use more memory and computational time. • Fewer features can produce a poor classifier. For this example, choose 100 features. q = 100;

Extract Features There are two feature extraction functions, sparsefilt and rica. Begin with the sparsefilt function. Set the number of iterations to 10 so that the extraction does not take too long. Typically, you get good results by running the sparsefilt algorithm for a few iterations to a few hundred iterations. Running the algorithm for too many iterations can lead to decreased classification accuracy, a type of overfitting problem. Use sparsefilt to obtain the sparse filtering model while using 10 iterations. Mdl = sparsefilt(Xtrain,q,'IterationLimit',10); Warning: Solver LBFGS was not able to converge to a solution.

sparsefilt warns that the internal LBFGS optimizer did not converge. The optimizer did not converge because you set the iteration limit to 10. Nevertheless, you can use the result to train a classifier. Create Classifier Transform the original data into the new feature representation. NewX = transform(Mdl,Xtrain);

Train a linear classifier based on the transformed data and the correct classification labels in LabelTrain. The accuracy of the learned model is sensitive to the fitcecoc regularization parameter Lambda. Try to find the best value for Lambda by using the OptimizeHyperparameters name-value pair. Be aware that this optimization takes time. If you have a Parallel Computing Toolbox™ license, use parallel computing for faster execution. If you don't have a parallel license, remove the UseParallel calls before running this script. t = templateLinear('Solver','lbfgs'); options = struct('UseParallel',true);

15-136

Feature Extraction Workflow

Cmdl = fitcecoc(NewX,LabelTrain,'Learners',t, ... 'OptimizeHyperparameters',{'Lambda'}, ... 'HyperparameterOptimizationOptions',options);

Copying objective function to workers... Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.5777 | 8.5334 | 0.5777 | 0.5777 | 0.20606 | 2 | 5 | Accept | 0.8865 | 8.9062 | 0.2041 | 0.27206 | 8.8234 | 3 | 5 | Best | 0.2041 | 9.7024 | 0.2041 | 0.27206 | 0.026804 | 4 | 6 | Best | 0.1077 | 14.629 | 0.1077 | 0.10773 | 1.7309e-09 | 5 | 6 | Best | 0.0962 | 15.767 | 0.0962 | 0.096203 | 0.0002442 | 6 | 6 | Accept | 0.1999 | 6.4363 | 0.0962 | 0.09622 | 0.024862 | 7 | 6 | Accept | 0.2074 | 6.4171 | 0.0962 | 0.096222 | 0.029034 | 8 | 6 | Accept | 0.1065 | 12.974 | 0.0962 | 0.096222 | 2.037e-08 | 9 | 6 | Accept | 0.0977 | 22.976 | 0.0962 | 0.096216 | 8.0495e-06 | 10 | 6 | Accept | 0.1237 | 8.5033 | 0.0962 | 0.096199 | 0.0029745 | 11 | 6 | Accept | 0.1076 | 10.653 | 0.0962 | 0.096208 | 0.00080903 | 12 | 6 | Accept | 0.1034 | 16.761 | 0.0962 | 0.0962 | 3.2145e-07 | 13 | 6 | Best | 0.0933 | 16.715 | 0.0933 | 0.093293 | 6.3327e-05 | 14 | 6 | Accept | 0.109 | 12.946 | 0.0933 | 0.09328 | 5.7887e-09 | 15 | 6 | Accept | 0.0994 | 18.805 | 0.0933 | 0.093312 | 1.8981e-06 | 16 | 6 | Accept | 0.106 | 15.088 | 0.0933 | 0.093306 | 7.4684e-08 | 17 | 6 | Accept | 0.0952 | 20.372 | 0.0933 | 0.093285 | 2.2831e-05 | 18 | 6 | Accept | 0.0933 | 14.528 | 0.0933 | 0.093459 | 0.00013097 | 19 | 6 | Accept | 0.1082 | 12.764 | 0.0933 | 0.093458 | 1.0001e-09 | 20 | 6 | Best | 0.0915 | 16.157 | 0.0915 | 0.092391 | 8.3234e-05 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Accept | 0.8865 | 6.6373 | 0.0915 | 0.092387 | 1.6749 | 22 | 6 | Accept | 0.0929 | 17.306 | 0.0915 | 0.092457 | 0.00010668 | 23 | 6 | Accept | 0.0937 | 19.046 | 0.0915 | 0.092535 | 5.0962e-05 | 24 | 6 | Accept | 0.0916 | 17.932 | 0.0915 | 0.092306 | 9.023e-05 | 25 | 6 | Accept | 0.0935 | 17.53 | 0.0915 | 0.092431 | 0.00011726 | 26 | 6 | Accept | 0.1474 | 8.3795 | 0.0915 | 0.092397 | 0.006997 | 27 | 6 | Accept | 0.0939 | 19.188 | 0.0915 | 0.092427 | 5.2557e-05 | 28 | 6 | Accept | 0.1147 | 10.686 | 0.0915 | 0.092432 | 0.0015036 | 29 | 6 | Accept | 0.1049 | 16.609 | 0.0915 | 0.092434 | 1.4871e-07 | 30 | 6 | Accept | 0.1069 | 13.929 | 0.0915 | 0.092435 | 1.0899e-08 __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 83.1976 seconds. Total objective function evaluation time: 416.8767 Best observed feasible point: Lambda __________ 8.3234e-05

15-137

15

Multivariate Methods

Observed objective function value = 0.0915 Estimated objective function value = 0.09245 Function evaluation time = 16.1569 Best estimated feasible point (according to models): Lambda _________ 9.023e-05 Estimated objective function value = 0.092435 Estimated function evaluation time = 17.0972

15-138

Feature Extraction Workflow

Evaluate Classifier Check the error of the classifier when applied to test data. First, load the test data. imageFileName = 'train-images.idx3-ubyte'; labelFileName = 'train-labels.idx1-ubyte'; [Xtest,LabelTest] = processMNISTdata(imageFileName,labelFileName); Read MNIST image data... Number of images in the dataset: 60000 ... Each image is of 28 by 28 pixels... The image data is read to a matrix of dimensions: End of reading image data.

60000 by

784...

15-139

15

Multivariate Methods

Read MNIST label data... Number of labels in the dataset: 60000 ... The label data is read to a matrix of dimensions: End of reading label data.

60000 by

1...

Calculate the classification loss when applying the classifier to the test data. TestX = transform(Mdl,Xtest); Loss = loss(Cmdl,TestX,LabelTest) Loss = 0.1009

Did this transformation result in a better classifier than one trained on the original data? Create a classifier based on the original training data and evaluate its loss. Omdl = fitcecoc(Xtrain,LabelTrain,'Learners',t, ... 'OptimizeHyperparameters',{'Lambda'}, ... 'HyperparameterOptimizationOptions',options); Losso = loss(Omdl,Xtest,LabelTest)

Copying objective function to workers... Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 5 | Best | 0.0779 | 46.965 | 0.0779 | 0.0779 | 5.7933e-08 | 2 | 5 | Accept | 0.0779 | 47.003 | 0.0779 | 0.0779 | 3.8643e-09 | 3 | 5 | Accept | 0.0779 | 47.068 | 0.0779 | 0.0779 | 1.3269e-06 | 4 | 6 | Accept | 0.078 | 60.714 | 0.0779 | 0.077925 | 3.0332e-05 | 5 | 6 | Accept | 0.0787 | 133.21 | 0.0779 | 0.0779 | 0.011605 | 6 | 6 | Best | 0.0775 | 135.97 | 0.0775 | 0.077983 | 0.00020291 | 7 | 6 | Accept | 0.0779 | 44.642 | 0.0775 | 0.077971 | 5.735e-08 | 8 | 6 | Accept | 0.0785 | 123.19 | 0.0775 | 0.0775 | 0.024589 | 9 | 6 | Accept | 0.0779 | 43.574 | 0.0775 | 0.0775 | 1.0042e-09 | 10 | 6 | Accept | 0.0779 | 43.038 | 0.0775 | 0.0775 | 4.7227e-06 | 11 | 6 | Best | 0.0774 | 137.51 | 0.0774 | 0.077451 | 0.00021639 | 12 | 6 | Accept | 0.0779 | 44.07 | 0.0774 | 0.077452 | 6.7132e-09 | 13 | 6 | Accept | 0.0779 | 44.822 | 0.0774 | 0.077453 | 2.873e-07 | 14 | 6 | Best | 0.0744 | 233.12 | 0.0744 | 0.074402 | 6.805 | 15 | 6 | Accept | 0.0778 | 140.49 | 0.0744 | 0.074406 | 0.66889 | 16 | 6 | Accept | 0.0774 | 149.32 | 0.0744 | 0.074405 | 0.0002769 | 17 | 6 | Accept | 0.0774 | 155 | 0.0744 | 0.074404 | 0.00046083 | 18 | 6 | Accept | 0.0765 | 152.63 | 0.0744 | 0.074687 | 0.00027101 | 19 | 6 | Accept | 0.0768 | 156.32 | 0.0744 | 0.077558 | 0.00026573 | 20 | 6 | Best | 0.0725 | 255.51 | 0.0725 | 0.073249 | 9.9961 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Best | 0.0723 | 221.5 | 0.0723 | 0.073161 | 4.212 | 22 | 6 | Accept | 0.0732 | 259.51 | 0.0723 | 0.073166 | 9.9916 | 23 | 6 | Best | 0.072 | 261.94 | 0.072 | 0.072848 | 9.9883

15-140

Feature Extraction Workflow

| | | | | | |

24 25 26 27 28 29 30

| | | | | | |

6 6 6 6 6 6 6

| | | | | | |

Accept Accept Accept Accept Accept Accept Accept

| | | | | | |

0.0778 0.0733 0.0746 0.0779 0.078 0.0779 0.0779

| | | | | | |

122.56 258.54 244.53 44.573 45.478 43.954 44.574

| | | | | | |

0.072 0.072 0.072 0.072 0.072 0.072 0.072

| | | | | | |

0.072854 0.072946 0.073144 0.073134 0.073126 0.073118 0.073112

| | | | | | |

0.13413 9.9904 7.0911 2.1183e-08 1.1663e-05 1.336e-07 1.7282e-09

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 690.8688 seconds. Total objective function evaluation time: 3741.3176 Best observed feasible point: Lambda ______ 9.9883 Observed objective function value = 0.072 Estimated objective function value = 0.073112 Function evaluation time = 261.9357 Best estimated feasible point (according to models): Lambda ______ 9.9961 Estimated objective function value = 0.073112 Estimated function evaluation time = 257.9556 Losso = 0.0865

15-141

15

Multivariate Methods

15-142

Feature Extraction Workflow

The classifier based on sparse filtering has a somewhat higher loss than the classifier based on the original data. However, the classifier uses only 100 features rather than the 784 features in the original data, and is much faster to create. Try to make a better sparse filtering classifier by increasing q from 100 to 200, which is still far less than 784. q = 200; Mdl2 = sparsefilt(Xtrain,q,'IterationLimit',10); NewX = transform(Mdl2,Xtrain); TestX = transform(Mdl2,Xtest); Cmdl = fitcecoc(NewX,LabelTrain,'Learners',t, ... 'OptimizeHyperparameters',{'Lambda'}, ... 'HyperparameterOptimizationOptions',options); Loss2 = loss(Cmdl,TestX,LabelTest)

15-143

15

Multivariate Methods

Warning: Solver LBFGS was not able to converge to a solution. Copying objective function to workers... Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 5 | Best | 0.8865 | 7.3578 | 0.8865 | 0.8865 | 1.93 | 2 | 5 | Accept | 0.8865 | 7.3408 | 0.8865 | 0.8865 | 2.5549 | 3 | 6 | Best | 0.0693 | 9.0077 | 0.0693 | 0.069376 | 9.9515e-09 | 4 | 5 | Accept | 0.0705 | 9.1067 | 0.0693 | 0.069374 | 1.2123e-08 | 5 | 5 | Accept | 0.1489 | 9.5685 | 0.0693 | 0.069374 | 0.015542 | 6 | 6 | Accept | 0.8865 | 7.5032 | 0.0693 | 0.06943 | 4.7067 | 7 | 6 | Accept | 0.071 | 8.8044 | 0.0693 | 0.069591 | 5.0861e-09 | 8 | 6 | Accept | 0.0715 | 8.9517 | 0.0693 | 0.070048 | 1.001e-09 | 9 | 6 | Accept | 0.0833 | 14.393 | 0.0693 | 0.069861 | 0.0014191 | 10 | 6 | Best | 0.0594 | 25.565 | 0.0594 | 0.059458 | 6.767e-05 | 11 | 6 | Accept | 0.0651 | 20.074 | 0.0594 | 0.059463 | 8.078e-07 | 12 | 6 | Accept | 0.0695 | 14.495 | 0.0594 | 0.059473 | 1.0381e-07 | 13 | 6 | Accept | 0.1042 | 12.085 | 0.0594 | 0.059386 | 0.0039745 | 14 | 6 | Accept | 0.065 | 20.235 | 0.0594 | 0.059416 | 0.00031759 | 15 | 6 | Accept | 0.0705 | 10.929 | 0.0594 | 0.059416 | 3.6503e-08 | 16 | 6 | Accept | 0.0637 | 30.593 | 0.0594 | 0.059449 | 8.8718e-06 | 17 | 6 | Accept | 0.064 | 25.084 | 0.0594 | 0.059464 | 2.6286e-06 | 18 | 6 | Accept | 0.0605 | 31.964 | 0.0594 | 0.059387 | 2.459e-05 | 19 | 6 | Accept | 0.0606 | 23.149 | 0.0594 | 0.059312 | 0.0001464 | 20 | 6 | Accept | 0.0602 | 32.178 | 0.0594 | 0.059874 | 4.1437e-05 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Accept | 0.0594 | 27.686 | 0.0594 | 0.059453 | 8.0717e-05 | 22 | 6 | Accept | 0.0612 | 33.427 | 0.0594 | 0.059476 | 1.6878e-05 | 23 | 6 | Accept | 0.0673 | 17.444 | 0.0594 | 0.059475 | 3.1788e-07 | 24 | 6 | Best | 0.0593 | 26.262 | 0.0593 | 0.05944 | 7.8179e-05 | 25 | 6 | Accept | 0.248 | 7.6345 | 0.0593 | 0.059409 | 0.095654 | 26 | 6 | Accept | 0.0598 | 28.536 | 0.0593 | 0.059465 | 5.0819e-05 | 27 | 6 | Accept | 0.0701 | 9.0545 | 0.0593 | 0.059466 | 1.8937e-09 | 28 | 5 | Accept | 0.7081 | 7.1176 | 0.0593 | 0.059372 | 0.30394 | 29 | 5 | Accept | 0.0676 | 11.782 | 0.0593 | 0.059372 | 6.1136e-08 | 30 | 3 | Accept | 0.06 | 23.556 | 0.0593 | 0.059422 | 0.00010144 | 31 | 3 | Accept | 0.0725 | 16.069 | 0.0593 | 0.059422 | 0.00069403 | 32 | 3 | Accept | 0.1928 | 8.3732 | 0.0593 | 0.059422 | 0.040402 __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 32 Total elapsed time: 97.7946 seconds. Total objective function evaluation time: 545.3255 Best observed feasible point: Lambda __________ 7.8179e-05 Observed objective function value = 0.0593

15-144

Feature Extraction Workflow

Estimated objective function value = 0.059422 Function evaluation time = 26.2624 Best estimated feasible point (according to models): Lambda __________ 7.8179e-05 Estimated objective function value = 0.059422 Estimated function evaluation time = 26.508 Loss2 = 0.0682

15-145

15

Multivariate Methods

This time the classification loss is lower than that of the original data classifier. Try RICA Try the other feature extraction function, rica. Extract 200 features, create a classifier, and examine its loss on the test data. Use more iterations for the rica function, because rica can perform better with more iterations than sparsefilt uses. Often prior to feature extraction, you "prewhiten" the input data as a data preprocessing step. The prewhitening step includes two transforms, decorrelation and standardization, which make the predictors have zero mean and identity covariance. rica supports only the standardization transform. You use the Standardize name-value pair argument to make the predictors have zero

15-146

Feature Extraction Workflow

mean and unit variance. Alternatively, you can transform images for contrast normalization individually by applying the zscore transformation before calling sparsefilt or rica. Mdl3 = rica(Xtrain,q,'IterationLimit',400,'Standardize',true); NewX = transform(Mdl3,Xtrain); TestX = transform(Mdl3,Xtest); Cmdl = fitcecoc(NewX,LabelTrain,'Learners',t, ... 'OptimizeHyperparameters',{'Lambda'}, ... 'HyperparameterOptimizationOptions',options); Loss3 = loss(Cmdl,TestX,LabelTest)

Warning: Solver LBFGS was not able to converge to a solution. Copying objective function to workers... Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.1179 | 12.012 | 0.1179 | 0.1179 | 8.4727 | 2 | 6 | Best | 0.082 | 13.384 | 0.082 | 0.083897 | 4.3291e-09 | 3 | 6 | Best | 0.0809 | 18.917 | 0.0809 | 0.080902 | 1.738e-05 | 4 | 6 | Accept | 0.0821 | 19.172 | 0.0809 | 0.08091 | 3.8101e-06 | 5 | 6 | Accept | 0.0921 | 14.445 | 0.0809 | 0.086349 | 2.3753 | 6 | 6 | Accept | 0.0809 | 13.393 | 0.0809 | 0.083836 | 1.3757e-08 | 7 | 6 | Best | 0.076 | 28.075 | 0.076 | 0.081808 | 0.00027773 | 8 | 6 | Best | 0.0758 | 29.686 | 0.0758 | 0.078829 | 0.00068195 | 9 | 6 | Accept | 0.0829 | 13.373 | 0.0758 | 0.078733 | 1.7543e-07 | 10 | 6 | Accept | 0.0826 | 14.031 | 0.0758 | 0.078512 | 1.0045e-09 | 11 | 6 | Accept | 0.0817 | 13.662 | 0.0758 | 0.078077 | 2.4568e-08 | 12 | 6 | Accept | 0.0799 | 19.311 | 0.0758 | 0.077658 | 1.4061e-05 | 13 | 6 | Best | 0.065 | 25.148 | 0.065 | 0.064974 | 0.060326 | 14 | 6 | Accept | 0.0787 | 23.434 | 0.065 | 0.064947 | 0.00012407 | 15 | 6 | Accept | 0.072 | 19.167 | 0.065 | 0.064997 | 0.43899 | 16 | 6 | Accept | 0.073 | 28.39 | 0.065 | 0.065053 | 0.0023721 | 17 | 6 | Accept | 0.0787 | 29.887 | 0.065 | 0.064928 | 0.00042914 | 18 | 6 | Accept | 0.0662 | 26.374 | 0.065 | 0.064295 | 0.0077638 | 19 | 6 | Accept | 0.0652 | 24.937 | 0.065 | 0.064502 | 0.087389 | 20 | 6 | Accept | 0.0655 | 25.416 | 0.065 | 0.064762 | 0.072931 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Best | 0.0645 | 25.529 | 0.0645 | 0.064691 | 0.059245 | 22 | 6 | Accept | 0.065 | 23.832 | 0.0645 | 0.06474 | 0.025521 | 23 | 6 | Accept | 0.0819 | 20.343 | 0.0645 | 0.064732 | 7.2593e-07 | 24 | 6 | Accept | 0.0664 | 23.732 | 0.0645 | 0.064718 | 0.1534 | 25 | 6 | Accept | 0.0651 | 24.796 | 0.0645 | 0.064693 | 0.038371 | 26 | 6 | Accept | 0.0651 | 25.449 | 0.0645 | 0.064613 | 0.014318 | 27 | 6 | Accept | 0.0652 | 25.092 | 0.0645 | 0.064713 | 0.037107 | 28 | 6 | Accept | 0.0645 | 24.404 | 0.0645 | 0.0647 | 0.042959 | 29 | 6 | Accept | 0.0649 | 24.704 | 0.0645 | 0.064729 | 0.042776 | 30 | 6 | Accept | 0.0652 | 24.341 | 0.0645 | 0.064786 | 0.035788 __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 124.9755 seconds.

15-147

15

Multivariate Methods

Total objective function evaluation time: 654.4364 Best observed feasible point: Lambda ________ 0.059245 Observed objective function value = 0.0645 Estimated objective function value = 0.064932 Function evaluation time = 25.5294 Best estimated feasible point (according to models): Lambda ________ 0.042776 Estimated objective function value = 0.064786 Estimated function evaluation time = 24.7849 Loss3 = 0.0749

15-148

Feature Extraction Workflow

The rica-based classifier has somewhat higher test loss compared to the sparse filtering classifier. Try More Features The feature extraction functions have few tuning parameters. One parameter that can affect results is the number of requested features. See how well classifiers work when based on 1000 features, rather than the 200 features previously tried, or the 784 features in the original data. Using more features than appear in the original data is called "overcomplete" learning. Conversely, using fewer features is called "undercomplete" learning. Overcomplete learning can lead to increased classification accuracy, while undercomplete learning can save memory and time. q = 1000; Mdl4 = sparsefilt(Xtrain,q,'IterationLimit',10);

15-149

15

Multivariate Methods

NewX = transform(Mdl4,Xtrain); TestX = transform(Mdl4,Xtest); Cmdl = fitcecoc(NewX,LabelTrain,'Learners',t, ... 'OptimizeHyperparameters',{'Lambda'}, ... 'HyperparameterOptimizationOptions',options); Loss4 = loss(Cmdl,TestX,LabelTest)

Warning: Solver LBFGS was not able to converge to a solution. Copying objective function to workers... Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.5293 | 39.885 | 0.5293 | 0.5293 | 0.20333 | 2 | 6 | Accept | 0.8022 | 43.475 | 0.5293 | 0.66575 | 0.77337 | 3 | 6 | Best | 0.0406 | 52.594 | 0.0406 | 0.11113 | 9.1082e-09 | 4 | 6 | Best | 0.0403 | 54.73 | 0.0403 | 0.060037 | 2.3947e-09 | 5 | 6 | Accept | 0.0695 | 124.96 | 0.0403 | 0.040319 | 0.001361 | 6 | 6 | Accept | 0.0406 | 53.691 | 0.0403 | 0.040207 | 1.0005e-09 | 7 | 6 | Best | 0.0388 | 178.69 | 0.0388 | 0.038811 | 1.4358e-06 | 8 | 6 | Accept | 0.0615 | 138.53 | 0.0388 | 0.038817 | 0.00088731 | 9 | 6 | Best | 0.0385 | 61.81 | 0.0385 | 0.038557 | 7.4709e-08 | 10 | 6 | Accept | 0.0399 | 54.198 | 0.0385 | 0.038555 | 2.1909e-08 | 11 | 6 | Accept | 0.0402 | 234.55 | 0.0385 | 0.038639 | 0.000101 | 12 | 6 | Accept | 0.0431 | 198.09 | 0.0385 | 0.038636 | 0.00018896 | 13 | 6 | Accept | 0.0393 | 75.811 | 0.0385 | 0.039016 | 1.1597e-07 | 14 | 6 | Accept | 0.0387 | 61.281 | 0.0385 | 0.038908 | 7.0518e-08 | 15 | 6 | Accept | 0.0393 | 125.73 | 0.0385 | 0.038931 | 2.8429e-07 | 16 | 6 | Accept | 0.0397 | 89.804 | 0.0385 | 0.039106 | 1.4603e-07 | 17 | 6 | Accept | 0.0391 | 126.88 | 0.0385 | 0.039081 | 3.0065e-07 | 18 | 6 | Accept | 0.0398 | 56.157 | 0.0385 | 0.039123 | 4.1563e-08 | 19 | 6 | Accept | 0.0406 | 55.25 | 0.0385 | 0.039122 | 1.0014e-09 | 20 | 6 | Accept | 0.0385 | 272.92 | 0.0385 | 0.039127 | 9.568e-06 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Accept | 0.0412 | 55.191 | 0.0385 | 0.039124 | 3.3737e-09 | 22 | 6 | Accept | 0.0394 | 229.72 | 0.0385 | 0.039117 | 3.2757e-06 | 23 | 6 | Best | 0.0379 | 295.55 | 0.0379 | 0.039116 | 2.8439e-05 | 24 | 6 | Accept | 0.0394 | 168.74 | 0.0379 | 0.039111 | 9.778e-07 | 25 | 6 | Accept | 0.039 | 281.91 | 0.0379 | 0.039112 | 8.0694e-06 | 26 | 6 | Accept | 0.8865 | 54.865 | 0.0379 | 0.038932 | 9.9885 | 27 | 6 | Accept | 0.0381 | 300.7 | 0.0379 | 0.037996 | 2.6027e-05 | 28 | 6 | Accept | 0.0406 | 54.611 | 0.0379 | 0.037996 | 1.6057e-09 | 29 | 6 | Accept | 0.1272 | 76.648 | 0.0379 | 0.037997 | 0.012507 | 30 | 6 | Accept | 0.0403 | 57.931 | 0.0379 | 0.037997 | 4.9907e-08 __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 724.6036 seconds. Total objective function evaluation time: 3674.8899 Best observed feasible point: Lambda

15-150

Feature Extraction Workflow

__________ 2.8439e-05 Observed objective function value = 0.0379 Estimated objective function value = 0.03801 Function evaluation time = 295.5515 Best estimated feasible point (according to models): Lambda __________ 2.6027e-05 Estimated objective function value = 0.037997 Estimated function evaluation time = 297.6756 Loss4 = 0.0440

15-151

15

Multivariate Methods

The classifier based on overcomplete sparse filtering with 1000 extracted features has the lowest test loss of any classifier yet tested. Mdl5 = rica(Xtrain,q,'IterationLimit',400,'Standardize',true); NewX = transform(Mdl5,Xtrain); TestX = transform(Mdl5,Xtest); Cmdl = fitcecoc(NewX,LabelTrain,'Learners',t, ... 'OptimizeHyperparameters',{'Lambda'}, ... 'HyperparameterOptimizationOptions',options); Loss5 = loss(Cmdl,TestX,LabelTest) Warning: Solver LBFGS was not able to converge to a solution. Copying objective function to workers... Done copying objective function to workers.

15-152

Feature Extraction Workflow

|================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.0764 | 46.206 | 0.0764 | 0.0764 | 8.4258e-09 | 2 | 6 | Accept | 0.077 | 141.95 | 0.0764 | 0.0767 | 6.9536e-06 | 3 | 6 | Accept | 0.0771 | 146.87 | 0.0764 | 0.076414 | 7.3378e-06 | 4 | 6 | Best | 0.0709 | 182.51 | 0.0709 | 0.0709 | 0.48851 | 5 | 6 | Accept | 0.0764 | 46.923 | 0.0709 | 0.070903 | 5.0695e-09 | 6 | 6 | Best | 0.068 | 294.89 | 0.068 | 0.068004 | 0.0029652 | 7 | 6 | Accept | 0.125 | 99.095 | 0.068 | 0.068001 | 9.9814 | 8 | 6 | Accept | 0.0693 | 321.66 | 0.068 | 0.067999 | 0.0015167 | 9 | 6 | Accept | 0.0882 | 138.03 | 0.068 | 0.068 | 1.8203 | 10 | 6 | Accept | 0.0753 | 285.07 | 0.068 | 0.067991 | 0.00042423 | 11 | 6 | Accept | 0.0764 | 47.704 | 0.068 | 0.067984 | 1.6326e-07 | 12 | 6 | Accept | 0.0763 | 46.514 | 0.068 | 0.06798 | 1.0048e-09 | 13 | 6 | Best | 0.0643 | 252.2 | 0.0643 | 0.0643 | 0.095965 | 14 | 6 | Accept | 0.0766 | 168.37 | 0.0643 | 0.0643 | 9.1336e-07 | 15 | 6 | Accept | 0.0753 | 153.29 | 0.0643 | 0.064301 | 4.8641e-05 | 16 | 6 | Accept | 0.0662 | 256.65 | 0.0643 | 0.064298 | 0.0093576 | 17 | 6 | Best | 0.0632 | 224.2 | 0.0632 | 0.063226 | 0.031314 | 18 | 6 | Accept | 0.0673 | 219.59 | 0.0632 | 0.063201 | 0.20528 | 19 | 6 | Accept | 0.0637 | 244.17 | 0.0632 | 0.063208 | 0.075001 | 20 | 6 | Accept | 0.064 | 234.85 | 0.0632 | 0.06321 | 0.081232 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Accept | 0.0646 | 242.2 | 0.0632 | 0.063315 | 0.078081 | 22 | 6 | Accept | 0.0633 | 217.97 | 0.0632 | 0.063233 | 0.039495 | 23 | 6 | Accept | 0.0643 | 224.22 | 0.0632 | 0.063496 | 0.052107 | 24 | 6 | Accept | 0.0761 | 45.102 | 0.0632 | 0.063509 | 4.3946e-08 | 25 | 6 | Accept | 0.0645 | 221.24 | 0.0632 | 0.063778 | 0.044455 | 26 | 6 | Accept | 0.0763 | 44.572 | 0.0632 | 0.063778 | 1.9139e-09 | 27 | 6 | Accept | 0.0639 | 216.9 | 0.0632 | 0.063791 | 0.041759 | 28 | 6 | Accept | 0.0766 | 45.609 | 0.0632 | 0.06379 | 2.0642e-08 | 29 | 6 | Accept | 0.0765 | 121.35 | 0.0632 | 0.063789 | 3.5882e-07 | 30 | 6 | Accept | 0.0636 | 215.47 | 0.0632 | 0.063755 | 0.038062 __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 952.7987 seconds. Total objective function evaluation time: 5145.3787 Best observed feasible point: Lambda ________ 0.031314 Observed objective function value = 0.0632 Estimated objective function value = 0.063828 Function evaluation time = 224.2018 Best estimated feasible point (according to models): Lambda

15-153

15

Multivariate Methods

________ 0.044455 Estimated objective function value = 0.063755 Estimated function evaluation time = 219.4845 Loss5 = 0.0748

15-154

Feature Extraction Workflow

The classifier based on RICA with 1000 extracted features has a similar test loss to the RICA classifier based on 200 extracted features. Optimize Hyperparameters by Using bayesopt Feature extraction functions have these tuning parameters: • Iteration limit • Function, either rica or sparsefilt • Parameter Lambda • Number of learned features q

15-155

15

Multivariate Methods

The fitcecoc regularization parameter also affects the accuracy of the learned classifier. Include that parameter in the list of hyperparameters as well. To search among the available parameters effectively, try bayesopt. Use the following objective function, which includes parameters passed from the workspace. function objective = filterica(x,Xtrain,Xtest,LabelTrain,LabelTest,winit) initW = winit(1:size(Xtrain,2),1:x.q); if char(x.solver) == 'r' Mdl = rica(Xtrain,x.q,'Lambda',x.lambda,'IterationLimit',x.iterlim, ... 'InitialTransformWeights',initW,'Standardize',true); else Mdl = sparsefilt(Xtrain,x.q,'Lambda',x.lambda,'IterationLimit',x.iterlim, ... 'InitialTransformWeights',initW); end NewX = transform(Mdl,Xtrain); TestX = transform(Mdl,Xtest); t = templateLinear('Lambda',x.lambdareg,'Solver','lbfgs'); Cmdl = fitcecoc(NewX,LabelTrain,'Learners',t); objective = loss(Cmdl,TestX,LabelTest);

To remove sources of variation, fix an initial transform weight matrix. W = randn(1e4,1e3);

Create hyperparameters for the objective function. iterlim = optimizableVariable('iterlim',[5,500],'Type','integer'); lambda = optimizableVariable('lambda',[0,10]); solver = optimizableVariable('solver',{'r','s'},'Type','categorical'); qvar = optimizableVariable('q',[10,1000],'Type','integer'); lambdareg = optimizableVariable('lambdareg',[1e-6,1],'Transform','log'); vars = [iterlim,lambda,solver,qvar,lambdareg];

Run the optimization without the warnings that occur when the internal optimizations do not run to completion. Run for 60 iterations instead of the default 30 to give the optimization a better chance of locating a good value. warning('off','stats:classreg:learning:fsutils:Solver:LBFGSUnableToConverge'); results = bayesopt(@(x) filterica(x,Xtrain,Xtest,LabelTrain,LabelTest,W),vars, ... 'UseParallel',true,'MaxObjectiveEvaluations',60); warning('on','stats:classreg:learning:fsutils:Solver:LBFGSUnableToConverge');

Copying objective function to workers... Done copying objective function to workers. |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | iterlim | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 6 | Best | 0.16408 | 33.743 | 0.16408 | 0.16408 | 140 | 2 | 6 | Best | 0.079213 | 51.975 | 0.079213 | 0.09064 | 10 | 3 | 6 | Best | 0.074897 | 82.031 | 0.074897 | 0.074983 | 32 | 4 | 6 | Accept | 0.07546 | 93.221 | 0.074897 | 0.075073 | 178 | 5 | 6 | Accept | 0.13924 | 30.444 | 0.074897 | 0.074933 | 282

15-156

Feature Extraction Workflow

| 6 | 6 | Accept | 0.083964 | 133 | 0.074897 | 0.074933 | 58 | 7 | 6 | Accept | 0.08128 | 33.609 | 0.074897 | 0.074957 | 8 | 8 | 6 | Accept | 0.090751 | 203.96 | 0.074897 | 0.074913 | 131 | 9 | 6 | Accept | 0.090001 | 172.38 | 0.074897 | 0.074904 | 146 | 10 | 6 | Accept | 0.080191 | 316.8 | 0.074897 | 0.074897 | 164 | 11 | 6 | Best | 0.060472 | 40.777 | 0.060472 | 0.060731 | 5 | 12 | 6 | Accept | 0.079027 | 45.841 | 0.060472 | 0.060632 | 8 | 13 | 6 | Accept | 0.074823 | 237.43 | 0.060472 | 0.06067 | 109 | 14 | 6 | Accept | 0.84009 | 85.121 | 0.060472 | 0.060468 | 306 | 15 | 6 | Accept | 0.15637 | 200.13 | 0.060472 | 0.060451 | 90 | 16 | 6 | Accept | 0.69006 | 14.273 | 0.060472 | 0.06047 | 6 | 17 | 6 | Accept | 0.093035 | 205.83 | 0.060472 | 0.060469 | 263 | 18 | 6 | Accept | 0.18753 | 6.0238 | 0.060472 | 0.060527 | 36 | 19 | 6 | Accept | 0.119 | 749.98 | 0.060472 | 0.060751 | 482 | 20 | 6 | Accept | 0.076414 | 751.21 | 0.060472 | 0.060754 | 387 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | iterlim | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 6 | Accept | 0.099332 | 7.2298 | 0.060472 | 0.060828 | 20 | 22 | 6 | Accept | 0.090139 | 7.9815 | 0.060472 | 0.060858 | 11 | 23 | 6 | Accept | 0.076696 | 323.64 | 0.060472 | 0.060872 | 120 | 24 | 6 | Accept | 0.098003 | 50.544 | 0.060472 | 0.060876 | 492 | 25 | 6 | Accept | 0.10383 | 56.568 | 0.060472 | 0.06101 | 11 | 26 | 6 | Accept | 0.14405 | 30.426 | 0.060472 | 0.060797 | 477 | 27 | 6 | Accept | 0.09046 | 53.398 | 0.060472 | 0.060815 | 13 | 28 | 6 | Best | 0.051641 | 99.452 | 0.051641 | 0.051368 | 23 | 29 | 6 | Accept | 0.10016 | 6.4162 | 0.051641 | 0.051365 | 6 | 30 | 6 | Accept | 0.10943 | 40.676 | 0.051641 | 0.051391 | 488 | 31 | 6 | Accept | 0.086761 | 7.8419 | 0.051641 | 0.051393 | 24 | 32 | 6 | Best | 0.0504 | 96.816 | 0.0504 | 0.050526 | 14 | 33 | 6 | Accept | 0.088789 | 81.158 | 0.0504 | 0.050525 | 14 | 34 | 6 | Accept | 0.083083 | 887.17 | 0.0504 | 0.05052 | 351 | 35 | 6 | Best | 0.050023 | 99.493 | 0.050023 | 0.050372 | 19 | 36 | 6 | Accept | 0.053338 | 113.36 | 0.050023 | 0.050499 | 7 | 37 | 6 | Accept | 0.089024 | 70.047 | 0.050023 | 0.0505 | 15 | 38 | 6 | Accept | 0.052029 | 95.822 | 0.050023 | 0.050551 | 7 | 39 | 6 | Accept | 0.085992 | 73.422 | 0.050023 | 0.050528 | 5 | 40 | 6 | Accept | 0.091159 | 5.8348 | 0.050023 | 0.05052 | 15 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | iterlim | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 41 | 6 | Best | 0.046444 | 152.93 | 0.046444 | 0.047062 | 30 | 42 | 6 | Accept | 0.052712 | 58.107 | 0.046444 | 0.04698 | 12 | 43 | 6 | Accept | 0.058005 | 91.928 | 0.046444 | 0.047263 | 10 | 44 | 6 | Accept | 0.055413 | 103.25 | 0.046444 | 0.047306 | 7 | 45 | 6 | Accept | 0.052517 | 96.201 | 0.046444 | 0.049604 | 10 | 46 | 6 | Accept | 0.089527 | 76.617 | 0.046444 | 0.046888 | 20 | 47 | 6 | Accept | 0.050062 | 99.709 | 0.046444 | 0.046735 | 12 | 48 | 6 | Accept | 0.21166 | 90.117 | 0.046444 | 0.049716 | 495 | 49 | 6 | Accept | 0.054535 | 79.1 | 0.046444 | 0.046679 | 6 | 50 | 6 | Accept | 0.12385 | 964.74 | 0.046444 | 0.049963 | 474 | 51 | 6 | Accept | 0.052016 | 76.098 | 0.046444 | 0.049914 | 10 | 52 | 6 | Accept | 0.048984 | 95.054 | 0.046444 | 0.049891 | 12 | 53 | 6 | Accept | 0.1948 | 889.11 | 0.046444 | 0.047903 | 466 | 54 | 6 | Accept | 0.10652 | 5.076 | 0.046444 | 0.047961 | 10 | 55 | 6 | Accept | 0.074194 | 319.41 | 0.046444 | 0.04981 | 130

15-157

15

Multivariate Methods

| | | | |

56 57 58 59 60

| | | | |

6 6 6 6 6

| | | | |

Accept Accept Accept Accept Accept

| | | | |

0.1014 0.33214 0.054348 0.71471 0.074353

| | | | |

45.184 3.1996 96.616 3.0555 67.118

| | | | |

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 60 reached. Total function evaluations: 60 Total elapsed time: 1921.1117 seconds. Total objective function evaluation time: 9107.7006 Best observed feasible point: iterlim lambda solver _______ ______ ______ 30

4.0843

s

q ___

lambdareg _________

997

7.279e-06

Observed objective function value = 0.046444 Estimated objective function value = 0.053743 Function evaluation time = 152.932 Best estimated feasible point (according to models): iterlim lambda solver q lambdareg _______ ______ ______ ___ _________ 10

1.0798

s

922

1.133e-06

Estimated objective function value = 0.05084 Estimated function evaluation time = 90.9315

15-158

0.046444 0.046444 0.046444 0.046444 0.046444

| | | | |

0.049828 0.049785 0.050832 0.050852 0.05084

| | | | |

480 12 12 10 8

Feature Extraction Workflow

The resulting classifier does not have better (lower) loss than the classifier using sparsefilt for 1000 features, trained for 10 iterations. View the filter coefficients for the best hyperparameters that bayesopt found. The resulting images show the shapes of the extracted features. These shapes are recognizable as portions of handwritten digits. Xtbl = results.XAtMinObjective; Q = Xtbl.q; initW = W(1:size(Xtrain,2),1:Q); if char(Xtbl.solver) == 'r' Mdl = rica(Xtrain,Q,'Lambda',Xtbl.lambda,'IterationLimit',Xtbl.iterlim, ... 'InitialTransformWeights',initW,'Standardize',true); else Mdl = sparsefilt(Xtrain,Q,'Lambda',Xtbl.lambda,'IterationLimit',Xtbl.iterlim, ... 'InitialTransformWeights',initW); end Wts = Mdl.TransformWeights; Wts = reshape(Wts,[28,28,Q]); [dx,dy,~,~] = size(Wts); for f = 1:Q Wvec = Wts(:,:,f); Wvec = Wvec(:); Wvec =(Wvec - min(Wvec))/(max(Wvec) - min(Wvec)); Wts(:,:,f) = reshape(Wvec,dx,dy); end m = ceil(sqrt(Q));

15-159

15

Multivariate Methods

n img f for

= m; = zeros(m*dx,n*dy); = 1; i = 1:m for j = 1:n if (f max(Sig)*tol); assert(~all(idx == 0)); % 4. Get the non-zero elements of Sig and corresponding columns of U. Sig = Sig(idx); U = U(:,idx); % 5. Compute prewhitened data. mu = mean(X,1); Z = bsxfun(@minus,X,mu); Z = bsxfun(@times,Z*U,1./sqrt(Sig)); end

See Also ReconstructionICA | SparseFiltering | rica | sparsefilt

Related Examples •

“Feature Extraction Workflow” on page 15-135

More About •

15-170

“Feature Extraction” on page 15-130

16 Cluster Analysis • “Choose Cluster Analysis Method” on page 16-2 • “Hierarchical Clustering” on page 16-6 • “DBSCAN” on page 16-19 • “Partition Data Using Spectral Clustering” on page 16-26 • “k-Means Clustering” on page 16-33 • “Cluster Using Gaussian Mixture Model” on page 16-39 • “Cluster Gaussian Mixture Data Using Hard Clustering” on page 16-46 • “Cluster Gaussian Mixture Data Using Soft Clustering” on page 16-52 • “Tune Gaussian Mixture Models” on page 16-57 • “Cluster Evaluation” on page 16-63

16

Cluster Analysis

Choose Cluster Analysis Method This topic provides a brief overview of the available clustering methods in Statistics and Machine Learning Toolbox.

Clustering Methods Cluster analysis, also called segmentation analysis or taxonomy analysis, is a common unsupervised learning method. Unsupervised learning is used to draw inferences from data sets consisting of input data without labeled responses. For example, you can use cluster analysis for exploratory data analysis to find hidden patterns or groupings in unlabeled data. Cluster analysis creates groups, or clusters, of data. Objects that belong to the same cluster are similar to one another and distinct from objects that belong to different clusters. To quantify "similar" and "distinct," you can use a dissimilarity measure (or distance metric on page 18-11) that is specific to the domain of your application and your data set. Also, depending on your application, you might consider scaling (or standardizing) the variables in your data to give them equal importance during clustering. Statistics and Machine Learning Toolbox provides functionality for these clustering methods: • “Hierarchical Clustering” on page 16-2 • “k-Means and k-Medoids Clustering” on page 16-2 • “Density-Based Spatial Clustering of Algorithms with Noise (DBSCAN)” on page 16-3 • “Gaussian Mixture Model” on page 16-3 • “k-Nearest Neighbor Search and Radius Search” on page 16-3 • “Spectral Clustering” on page 16-3 Hierarchical Clustering Hierarchical clustering groups data over a variety of scales by creating a cluster tree, or dendrogram. The tree is not a single set of clusters, but rather a multilevel hierarchy, where clusters at one level combine to form clusters at the next level. This multilevel hierarchy allows you to choose the level, or scale, of clustering that is most appropriate for your application. Hierarchical clustering assigns every point in your data to a cluster. Use clusterdata to perform hierarchical clustering on input data. clusterdata incorporates the pdist, linkage, and cluster functions, which you can use separately for more detailed analysis. The dendrogram function plots the cluster tree. For more information, see “Introduction to Hierarchical Clustering” on page 16-6. k-Means and k-Medoids Clustering k-means clustering and k-medoids clustering partition data into k mutually exclusive clusters. These clustering methods require that you specify the number of clusters k. Both k-means and k-medoids clustering assign every point in your data to a cluster; however, unlike hierarchical clustering, these methods operate on actual observations (rather than dissimilarity measures), and create a single level of clusters. Therefore, k-means or k-medoids clustering is often more suitable than hierarchical clustering for large amounts of data. 16-2

Choose Cluster Analysis Method

Use kmeans and kmedoids to implement k-means clustering and k-medoids clustering, respectively. For more information, see Introduction to k-Means Clustering on page 16-33 and k-Medoids Clustering on page 32-2911. Density-Based Spatial Clustering of Algorithms with Noise (DBSCAN) DBSCAN is a density-based algorithm that identifies arbitrarily shaped clusters and outliers (noise) in data. During clustering, DBSCAN identifies points that do not belong to any cluster, which makes this method useful for density-based outlier detection. Unlike k-means and k-medoids clustering, DBSCAN does not require prior knowledge of the number of clusters. Use dbscan to perform clustering on an input data matrix or on pairwise distances between observations. For more information, see “Introduction to DBSCAN” on page 16-19. Gaussian Mixture Model A Gaussian mixture model (GMM) forms clusters as a mixture of multivariate normal density components. For a given observation, the GMM assigns posterior probabilities to each component density (or cluster). The posterior probabilities indicate that the observation has some probability of belonging to each cluster. A GMM can perform hard clustering by selecting the component that maximizes the posterior probability as the assigned cluster for the observation. You can also use a GMM to perform soft, or fuzzy, clustering by assigning the observation to multiple clusters based on the scores or posterior probabilities of the observation for the clusters. A GMM can be a more appropriate method than k-means clustering when clusters have different sizes and different correlation structures within them. Use fitgmdist to fit a gmdistribution object to your data. You can also use gmdistribution to create a GMM object by specifying the distribution parameters. When you have a fitted GMM, you can cluster query data by using the cluster function. For more information, see “Cluster Using Gaussian Mixture Model” on page 16-39. k-Nearest Neighbor Search and Radius Search k-nearest neighbor search finds the k closest points in your data to a query point or set of query points. In contrast, radius search finds all points in your data that are within a specified distance from a query point or set of query points. The results of these methods depend on the distance metric on page 18-11 that you specify. Use the knnsearch function to find k-nearest neighbors or the rangesearch function to find all neighbors within a specified distance of your input data. You can also create a searcher object using a training data set, and pass the object and query data sets to the object functions (knnsearch and rangesearch). For more information, see “Classification Using Nearest Neighbors” on page 18-11. Spectral Clustering Spectral clustering is a graph-based algorithm for finding k arbitrarily shaped clusters in data. The technique involves representing the data in a low dimension. In the low dimension, clusters in the data are more widely separated, enabling you to use algorithms such as k-means or k-medoids clustering. This low dimension is based on eigenvectors of a Laplacian matrix. A Laplacian matrix is one way of representing a similarity graph that models the local neighborhood relationships between data points as an undirected graph. Use spectralcluster to perform spectral clustering on an input data matrix or on a similarity matrix of a similarity graph. spectralcluster requires that you specify the number of clusters. 16-3

16

Cluster Analysis

However, the algorithm for spectral clustering also provides a way to estimate the number of clusters in your data. For more information, see “Partition Data Using Spectral Clustering” on page 16-26.

Comparison of Clustering Methods This table compares the features of available clustering methods in Statistics and Machine Learning Toolbox.

16-4

Method

Basis of Algorithm

“Hierarchical Clustering” on page 16-6

“k-Means Clustering” on page 16-33 and k-Medoids Clustering on page 32-2911

Input to Algorithm

Requires Specified Number of Clusters

Cluster Shapes Identified

Useful for Outlier Detection

Distance Pairwise between objects distances between observations

No

Arbitrarily No shaped clusters, depending on the specified 'Linkage' algorithm

Distance Actual between objects observations and centroids

Yes

Spheroidal clusters with equal diagonal covariance

No

Density-Based Density of Spatial regions in the Clustering of data Algorithms with Noise (“DBSCAN” on page 16-19)

Actual No observations or pairwise distances between observations

Arbitrarily Yes shaped clusters

“Gaussian Mixture Models”

Mixture of Gaussian distributions

Actual observations

Yes

Spheroidal clusters with different covariance structures

Nearest Neighbors

Distance Actual between objects observations

No

Arbitrarily Yes, depending shaped clusters on the specified number of neighbors

Spectral Clustering (“Partition Data Using Spectral Clustering” on page 16-26)

Graph representing connections between data points

Yes, but the algorithm also provides a way to estimate the number of clusters

Arbitrarily No shaped clusters

Actual observations or similarity matrix

Yes

Choose Cluster Analysis Method

See Also More About •

“Hierarchical Clustering” on page 16-6



“k-Means Clustering” on page 16-33



“DBSCAN” on page 16-19



“Cluster Using Gaussian Mixture Model” on page 16-39



“Partition Data Using Spectral Clustering” on page 16-26

16-5

16

Cluster Analysis

Hierarchical Clustering In this section... “Introduction to Hierarchical Clustering” on page 16-6 “Algorithm Description” on page 16-6 “Similarity Measures” on page 16-7 “Linkages” on page 16-8 “Dendrograms” on page 16-9 “Verify the Cluster Tree” on page 16-10 “Create Clusters” on page 16-15

Introduction to Hierarchical Clustering Hierarchical clustering groups data over a variety of scales by creating a cluster tree or dendrogram. The tree is not a single set of clusters, but rather a multilevel hierarchy, where clusters at one level are joined as clusters at the next level. This allows you to decide the level or scale of clustering that is most appropriate for your application. The Statistics and Machine Learning Toolbox function clusterdata supports agglomerative clustering and performs all of the necessary steps for you. It incorporates the pdist, linkage, and cluster functions, which you can use separately for more detailed analysis. The dendrogram function plots the cluster tree.

Algorithm Description To perform agglomerative hierarchical cluster analysis on a data set using Statistics and Machine Learning Toolbox functions, follow this procedure: 1

Find the similarity or dissimilarity between every pair of objects in the data set. In this step, you calculate the distance between objects using the pdist function. The pdist function supports many different ways to compute this measurement. See “Similarity Measures” on page 16-7 for more information.

2

Group the objects into a binary, hierarchical cluster tree. In this step, you link pairs of objects that are in close proximity using the linkage function. The linkage function uses the distance information generated in step 1 to determine the proximity of objects to each other. As objects are paired into binary clusters, the newly formed clusters are grouped into larger clusters until a hierarchical tree is formed. See “Linkages” on page 16-8 for more information.

3

Determine where to cut the hierarchical tree into clusters. In this step, you use the cluster function to prune branches off the bottom of the hierarchical tree, and assign all the objects below each cut to a single cluster. This creates a partition of the data. The cluster function can create these clusters by detecting natural groupings in the hierarchical tree or by cutting off the hierarchical tree at an arbitrary point.

The following sections provide more information about each of these steps. Note The Statistics and Machine Learning Toolbox function clusterdata performs all of the necessary steps for you. You do not need to execute the pdist, linkage, or cluster functions separately. 16-6

Hierarchical Clustering

Similarity Measures You use the pdist function to calculate the distance between every pair of objects in a data set. For a data set made up of m objects, there are m*(m – 1)/2 pairs in the data set. The result of this computation is commonly known as a distance or dissimilarity matrix. There are many ways to calculate this distance information. By default, the pdist function calculates the Euclidean distance between objects; however, you can specify one of several other options. See pdist for more information. Note You can optionally normalize the values in the data set before calculating the distance information. In a real world data set, variables can be measured against different scales. For example, one variable can measure Intelligence Quotient (IQ) test scores and another variable can measure head circumference. These discrepancies can distort the proximity calculations. Using the zscore function, you can convert all the values in the data set to use the same proportional scale. See zscore for more information. For example, consider a data set, X, made up of five objects where each object is a set of x,y coordinates. • Object 1: 1, 2 • Object 2: 2.5, 4.5 • Object 3: 2, 2 • Object 4: 4, 1.5 • Object 5: 4, 2.5 You can define this data set as a matrix rng default; % For reproducibility X = [1 2;2.5 4.5;2 2;4 1.5;... 4 2.5];

and pass it to pdist. The pdist function calculates the distance between object 1 and object 2, object 1 and object 3, and so on until the distances between all the pairs have been calculated. The following figure plots these objects in a graph. The Euclidean distance between object 2 and object 3 is shown to illustrate one interpretation of distance.

16-7

16

Cluster Analysis

Distance Information The pdist function returns this distance information in a vector, Y, where each element contains the distance between a pair of objects. Y = pdist(X) Y = 1×10 2.9155

1.0000

3.0414

3.0414

2.5495

3.3541

2.5000

2.0616

2.0616

To make it easier to see the relationship between the distance information generated by pdist and the objects in the original data set, you can reformat the distance vector into a matrix using the squareform function. In this matrix, element i,j corresponds to the distance between object i and object j in the original data set. In the following example, element 1,1 represents the distance between object 1 and itself (which is zero). Element 1,2 represents the distance between object 1 and object 2, and so on. squareform(Y) ans = 5×5 0 2.9155 1.0000 3.0414 3.0414

2.9155 0 2.5495 3.3541 2.5000

1.0000 2.5495 0 2.0616 2.0616

3.0414 3.3541 2.0616 0 1.0000

3.0414 2.5000 2.0616 1.0000 0

Linkages Once the proximity between objects in the data set has been computed, you can determine how objects in the data set should be grouped into clusters, using the linkage function. The linkage function takes the distance information generated by pdist and links pairs of objects that are close together into binary clusters (clusters made up of two objects). The linkage function then links these newly formed clusters to each other and to other objects to create bigger clusters until all the objects in the original data set are linked together in a hierarchical tree. For example, given the distance vector Y generated by pdist from the sample data set of x- and ycoordinates, the linkage function generates a hierarchical cluster tree, returning the linkage information in a matrix, Z. Z = linkage(Y) Z = 4×3 4.0000 1.0000 6.0000 2.0000

5.0000 3.0000 7.0000 8.0000

1.0000 1.0000 2.0616 2.5000

In this output, each row identifies a link between objects or clusters. The first two columns identify the objects that have been linked. The third column contains the distance between these objects. For the sample data set of x- and y-coordinates, the linkage function begins by grouping objects 4 16-8

1.0

Hierarchical Clustering

and 5, which have the closest proximity (distance value = 1.0000). The linkage function continues by grouping objects 1 and 3, which also have a distance value of 1.0000. The third row indicates that the linkage function grouped objects 6 and 7. If the original sample data set contained only five objects, what are objects 6 and 7? Object 6 is the newly formed binary cluster created by the grouping of objects 4 and 5. When the linkage function groups two objects into a new cluster, it must assign the cluster a unique index value, starting with the value m + 1, where m is the number of objects in the original data set. (Values 1 through m are already used by the original data set.) Similarly, object 7 is the cluster formed by grouping objects 1 and 3. linkage uses distances to determine the order in which it clusters objects. The distance vector Y contains the distances between the original objects 1 through 5. But linkage must also be able to determine distances involving clusters that it creates, such as objects 6 and 7. By default, linkage uses a method known as single linkage. However, there are a number of different methods available. See the linkage reference page for more information. As the final cluster, the linkage function grouped object 8, the newly formed cluster made up of objects 6 and 7, with object 2 from the original data set. The following figure graphically illustrates the way linkage groups the objects into a hierarchy of clusters.

Dendrograms The hierarchical, binary cluster tree created by the linkage function is most easily understood when viewed graphically. The Statistics and Machine Learning Toolbox function dendrogram plots the tree as follows. dendrogram(Z)

16-9

16

Cluster Analysis

In the figure, the numbers along the horizontal axis represent the indices of the objects in the original data set. The links between objects are represented as upside-down U-shaped lines. The height of the U indicates the distance between the objects. For example, the link representing the cluster containing objects 1 and 3 has a height of 1. The link representing the cluster that groups object 2 together with objects 1, 3, 4, and 5, (which are already clustered as object 8) has a height of 2.5. The height represents the distance linkage computes between objects 2 and 8. For more information about creating a dendrogram diagram, see the dendrogram reference page.

Verify the Cluster Tree After linking the objects in a data set into a hierarchical cluster tree, you might want to verify that the distances (that is, heights) in the tree reflect the original distances accurately. In addition, you might want to investigate natural divisions that exist among links between objects. Statistics and Machine Learning Toolbox functions are available for both of these tasks, as described in the following sections. • “Verify Dissimilarity” on page 16-10 • “Verify Consistency” on page 16-11 Verify Dissimilarity In a hierarchical cluster tree, any two objects in the original data set are eventually linked together at some level. The height of the link represents the distance between the two clusters that contain those two objects. This height is known as the cophenetic distance between the two objects. One way to 16-10

Hierarchical Clustering

measure how well the cluster tree generated by the linkage function reflects your data is to compare the cophenetic distances with the original distance data generated by the pdist function. If the clustering is valid, the linking of objects in the cluster tree should have a strong correlation with the distances between objects in the distance vector. The cophenet function compares these two sets of values and computes their correlation, returning a value called the cophenetic correlation coefficient. The closer the value of the cophenetic correlation coefficient is to 1, the more accurately the clustering solution reflects your data. You can use the cophenetic correlation coefficient to compare the results of clustering the same data set using different distance calculation methods or clustering algorithms. For example, you can use the cophenet function to evaluate the clusters created for the sample data set. c = cophenet(Z,Y) c = 0.8615

Z is the matrix output by the linkage function and Y is the distance vector output by the pdist function. Execute pdist again on the same data set, this time specifying the city block metric. After running the linkage function on this new pdist output using the average linkage method, call cophenet to evaluate the clustering solution. Y = pdist(X,'cityblock'); Z = linkage(Y,'average'); c = cophenet(Z,Y) c = 0.9047

The cophenetic correlation coefficient shows that using a different distance and linkage method creates a tree that represents the original distances slightly better. Verify Consistency One way to determine the natural cluster divisions in a data set is to compare the height of each link in a cluster tree with the heights of neighboring links below it in the tree. A link that is approximately the same height as the links below it indicates that there are no distinct divisions between the objects joined at this level of the hierarchy. These links are said to exhibit a high level of consistency, because the distance between the objects being joined is approximately the same as the distances between the objects they contain. On the other hand, a link whose height differs noticeably from the height of the links below it indicates that the objects joined at this level in the cluster tree are much farther apart from each other than their components were when they were joined. This link is said to be inconsistent with the links below it. In cluster analysis, inconsistent links can indicate the border of a natural division in a data set. The cluster function uses a quantitative measure of inconsistency to determine where to partition your data set into clusters. The following dendrogram illustrates inconsistent links. Note how the objects in the dendrogram fall into two groups that are connected by links at a much higher level in the tree. These links are inconsistent when compared with the links below them in the hierarchy. 16-11

16

Cluster Analysis

The relative consistency of each link in a hierarchical cluster tree can be quantified and expressed as the inconsistency coefficient. This value compares the height of a link in a cluster hierarchy with the average height of links below it. Links that join distinct clusters have a high inconsistency coefficient; links that join indistinct clusters have a low inconsistency coefficient. To generate a listing of the inconsistency coefficient for each link in the cluster tree, use the inconsistent function. By default, the inconsistent function compares each link in the cluster hierarchy with adjacent links that are less than two levels below it in the cluster hierarchy. This is called the depth of the comparison. You can also specify other depths. The objects at the bottom of the cluster tree, called leaf nodes, that have no further objects below them, have an inconsistency coefficient of zero. Clusters that join two leaves also have a zero inconsistency coefficient. For example, you can use the inconsistent function to calculate the inconsistency values for the links created by the linkage function in “Linkages” on page 16-8. First, recompute the distance and linkage values using the default settings. Y = pdist(X); Z = linkage(Y);

Next, use inconsistent to calculate the inconsistency values. 16-12

Hierarchical Clustering

I = inconsistent(Z) I = 4×4 1.0000 1.0000 1.3539 2.2808

0 0 0.6129 0.3100

1.0000 1.0000 3.0000 2.0000

0 0 1.1547 0.7071

The inconsistent function returns data about the links in an (m-1)-by-4 matrix, whose columns are described in the following table. Column

Description

1

Mean of the heights of all the links included in the calculation

2

Standard deviation of all the links included in the calculation

3

Number of links included in the calculation

4

Inconsistency coefficient

In the sample output, the first row represents the link between objects 4 and 5. This cluster is assigned the index 6 by the linkage function. Because both 4 and 5 are leaf nodes, the inconsistency coefficient for the cluster is zero. The second row represents the link between objects 1 and 3, both of which are also leaf nodes. This cluster is assigned the index 7 by the linkage function. The third row evaluates the link that connects these two clusters, objects 6 and 7. (This new cluster is assigned index 8 in the linkage output). Column 3 indicates that three links are considered in the calculation: the link itself and the two links directly below it in the hierarchy. Column 1 represents the mean of the heights of these links. The inconsistent function uses the height information output by the linkage function to calculate the mean. Column 2 represents the standard deviation between the links. The last column contains the inconsistency value for these links, 1.1547. It is the difference between the current link height and the mean, normalized by the standard deviation. (2.0616 - 1.3539) / .6129 ans = 1.1547

The following figure illustrates the links and heights included in this calculation.

16-13

16

Cluster Analysis

Note In the preceding figure, the lower limit on the y-axis is set to 0 to show the heights of the links. To set the lower limit to 0, select Axes Properties from the Edit menu, click the Y Axis tab, and enter 0 in the field immediately to the right of Y Limits. Row 4 in the output matrix describes the link between object 8 and object 2. Column 3 indicates that two links are included in this calculation: the link itself and the link directly below it in the hierarchy. The inconsistency coefficient for this link is 0.7071. The following figure illustrates the links and heights included in this calculation.

16-14

Hierarchical Clustering

Create Clusters After you create the hierarchical tree of binary clusters, you can prune the tree to partition your data into clusters using the cluster function. The cluster function lets you create clusters in two ways, as discussed in the following sections: • “Find Natural Divisions in Data” on page 16-15 • “Specify Arbitrary Clusters” on page 16-16 Find Natural Divisions in Data The hierarchical cluster tree may naturally divide the data into distinct, well-separated clusters. This can be particularly evident in a dendrogram diagram created from data where groups of objects are densely packed in certain areas and not in others. The inconsistency coefficient of the links in the cluster tree can identify these divisions where the similarities between objects change abruptly. (See “Verify the Cluster Tree” on page 16-10 for more information about the inconsistency coefficient.) You can use this value to determine where the cluster function creates cluster boundaries. For example, if you use the cluster function to group the sample data set into clusters, specifying an inconsistency coefficient threshold of 1.2 as the value of the cutoff argument, the cluster function groups all the objects in the sample data set into one cluster. In this case, none of the links in the cluster hierarchy had an inconsistency coefficient greater than 1.2. T = cluster(Z,'cutoff',1.2)

16-15

16

Cluster Analysis

T = 5×1 1 1 1 1 1

The cluster function outputs a vector, T, that is the same size as the original data set. Each element in this vector contains the number of the cluster into which the corresponding object from the original data set was placed. If you lower the inconsistency coefficient threshold to 0.8, the cluster function divides the sample data set into three separate clusters. T = cluster(Z,'cutoff',0.8) T = 5×1 3 2 3 1 1

This output indicates that objects 1 and 3 are in one cluster, objects 4 and 5 are in another cluster, and object 2 is in its own cluster. When clusters are formed in this way, the cutoff value is applied to the inconsistency coefficient. These clusters may, but do not necessarily, correspond to a horizontal slice across the dendrogram at a certain height. If you want clusters corresponding to a horizontal slice of the dendrogram, you can either use the criterion option to specify that the cutoff should be based on distance rather than inconsistency, or you can specify the number of clusters directly as described in the following section. Specify Arbitrary Clusters Instead of letting the cluster function create clusters determined by the natural divisions in the data set, you can specify the number of clusters you want created. For example, you can specify that you want the cluster function to partition the sample data set into two clusters. In this case, the cluster function creates one cluster containing objects 1, 3, 4, and 5 and another cluster containing object 2. T = cluster(Z,'maxclust',2) T = 5×1 2 1 2 2 2

To help you visualize how the cluster function determines these clusters, the following figure shows the dendrogram of the hierarchical cluster tree. The horizontal dashed line intersects two lines of the 16-16

Hierarchical Clustering

dendrogram, corresponding to setting 'maxclust' to 2. These two lines partition the objects into two clusters: the objects below the left-hand line, namely 1, 3, 4, and 5, belong to one cluster, while the object below the right-hand line, namely 2, belongs to the other cluster.

On the other hand, if you set 'maxclust' to 3, the cluster function groups objects 4 and 5 in one cluster, objects 1 and 3 in a second cluster, and object 2 in a third cluster. The following command illustrates this. T = cluster(Z,'maxclust',3) T = 5×1 2 3 2 1 1

This time, the cluster function cuts off the hierarchy at a lower point, corresponding to the horizontal line that intersects three lines of the dendrogram in the following figure.

16-17

16

Cluster Analysis

See Also More About •

16-18

“Choose Cluster Analysis Method” on page 16-2

DBSCAN

DBSCAN In this section... “Introduction to DBSCAN” on page 16-19 “Algorithm Description” on page 16-19 “Determine Values for DBSCAN Parameters” on page 16-20

Introduction to DBSCAN Density-Based Spatial Clustering of Applications with Noise (DBSCAN) identifies arbitrarily shaped clusters and noise (outliers) in data. The Statistics and Machine Learning Toolbox function dbscan performs clustering on an input data matrix or on pairwise distances between observations. dbscan returns the cluster indices and a vector indicating the observations that are core points (points inside clusters). Unlike k-means clustering, the DBSCAN algorithm does not require prior knowledge of the number of clusters, and clusters are not necessarily spheroidal. DBSCAN is also useful for densitybased outlier detection, because it identifies points that do not belong to any cluster. For a point to be assigned to a cluster, it must satisfy the condition that its epsilon neighborhood (epsilon) contains at least a minimum number of neighbors (minpts). Or, the point can lie within the epsilon neighborhood of another point that satisfies the epsilon and minpts conditions. The DBSCAN algorithm identifies three kinds of points: • Core point — A point in a cluster that has at least minpts neighbors in its epsilon neighborhood • Border point — A point in a cluster that has fewer than minpts neighbors in its epsilon neighborhood • Noise point — An outlier that does not belong to any cluster DBSCAN works with a wide range of distance metrics on page 32-0 , and you can define a custom distance metric for your particular application. The choice of a distance metric determines the shape of the neighborhood.

Algorithm Description For specified values of the epsilon neighborhood epsilon and the minimum number of neighbors minpts required for a core point, the dbscan function implements DBSCAN as follows: 1

From the input data set X, select the first unlabeled observation x1 as the current point, and initialize the first cluster label C to 1.

2

Find the set of points within the epsilon neighborhood epsilon of the current point. These points are the neighbors. a

If the number of neighbors is less than minpts, then label the current point as a noise point (or an outlier). Go to step 4. Note dbscan can reassign noise points to clusters if the noise points later satisfy the constraints set by epsilon and minpts from some other point in X. This process of reassigning points happens for border points of a cluster.

b

Otherwise, label the current point as a core point belonging to cluster C. 16-19

16

Cluster Analysis

3

Iterate over each neighbor (new current point) and repeat step 2 until no new neighbors are found that can be labeled as belonging to the current cluster C.

4

Select the next unlabeled point in X as the current point, and increase the cluster count by 1.

5

Repeat steps 2–4 until all points in X are labeled.

Determine Values for DBSCAN Parameters This example shows how to select values for the epsilon and minpts parameters of dbscan. The data set is a Lidar scan, stored as a collection of 3-D points, that contains the coordinates of objects surrounding a vehicle. Load, preprocess, and visualize the data set. Load the x, y, z coordinates of the objects. load('lidar_subset.mat') X = lidar_subset;

To highlight the environment around the vehicle, set the region of interest to span 20 meters to the left and right of the vehicle, 20 meters in front and back of the vehicle, and the area above the surface of the road. xBound = 20; % in meters yBound = 20; % in meters zLowerBound = 0; % in meters

Crop the data to contain only points within the specified region. indices = X(:,1) = -xBound ... & X(:,2) = -yBound ... & X(:,3) > zLowerBound; X = X(indices,:);

Visualize the data as a 2-D scatter plot. Annotate the plot to highlight the vehicle. scatter(X(:,1),X(:,2),'.'); annotation('ellipse',[0.48 0.48 .1 .1],'Color','red')

16-20

DBSCAN

The center of the set of points (circled in red) contains the roof and hood of the vehicle. All other points are obstacles. Select a value for minpts. To select a value for minpts, consider a value greater than or equal to one plus the number of dimensions of the input data [1]. For example, for an n-by-p matrix X, set the value of 'minpts' greater than or equal to p+1. For the given data set, specify a minpts value greater than or equal to 4, specifically the value 50. minpts = 50; % Minimum number of neighbors for a core point

Select a value for epsilon. One strategy for estimating a value for epsilon is to generate a k-distance graph for the input data X. For each point in X, find the distance to the kth nearest point, and plot sorted points against this distance. The graph contains a knee. The distance that corresponds to the knee is generally a good choice for epsilon, because it is the region where points start tailing off into outlier (noise) territory [1]. Before plotting the k-distance graph, first find the minpts smallest pairwise distances for observations in X, in ascending order. kD = pdist2(X,X,'euc','Smallest',minpts); % The minpts smallest pairwise distances

Plot the k-distance graph. 16-21

16

Cluster Analysis

plot(sort(kD(end,:))); title('k-distance graph') xlabel('Points sorted with 50th nearest distances') ylabel('50th nearest distances') grid

The knee appears to be around 2; therefore, set the value of epsilon to 2. epsilon = 2;

Cluster using dbscan. Use dbscan with the values of minpts and epsilon that were determined in the previous steps. labels = dbscan(X,epsilon,minpts);

Visualize the clustering and annotate the figure to highlight specific clusters. gscatter(X(:,1),X(:,2),labels); title('epsilon = 2 and minpts = grid annotation('ellipse',[0.54 0.41 annotation('ellipse',[0.53 0.85 annotation('ellipse',[0.39 0.85

16-22

50') .07 .07],'Color','red') .07 .07],'Color','blue') .07 .07],'Color','black')

DBSCAN

dbscan identifies 11 clusters and a set of noise points. The algorithm also identifies the vehicle at the center of the set of points as a distinct cluster. dbscan identifies some distinct clusters, such as the cluster circled in black (and centered around (– 6,18)) and the cluster circled in blue (and centered around (2.5,18)). The function also assigns the group of points circled in red (and centered around (3,–4)) to the same cluster (group 7) as the group of points in the southeast quadrant of the plot. The expectation is that these groups should be in separate clusters. Use a smaller value for epsilon to split up large clusters and further partition the points. epsilon2 = 1; labels2 = dbscan(X,epsilon2,minpts);

Visualize the clustering and annotate the figure to highlight specific clusters. gscatter(X(:,1),X(:,2),labels2); title('epsilon = 1 and minpts = 50') grid annotation('ellipse',[0.54 0.41 .07 .07],'Color','red') annotation('ellipse',[0.53 0.85 .07 .07],'Color','blue') annotation('ellipse',[0.39 0.85 .07 .07],'Color','black')

16-23

16

Cluster Analysis

By using a smaller epsilon value, dbscan is able to assign the group of points circled in red to a distinct cluster (group 13). However, some clusters that dbscan correctly identified before are now split between cluster points and outliers. For example, see cluster group 2 (circled in black) and cluster group 3 (circled in blue). The correct epsilon value is somewhere between 1 and 2. Use an epsilon value of 1.55 to cluster the data. epsilon3 = 1.55; labels3 = dbscan(X,epsilon3,minpts);

Visualize the clustering and annotate the figure to highlight specific clusters. gscatter(X(:,1),X(:,2),labels3); title('epsilon = 1.55 and minpts = 50') grid annotation('ellipse',[0.54 0.41 .07 .07],'Color','red') annotation('ellipse',[0.53 0.85 .07 .07],'Color','blue') annotation('ellipse',[0.39 0.85 .07 .07],'Color','black')

16-24

DBSCAN

dbscan does a better job of identifying the clusters when epsilon is set to 1.55. For example, the function identifies the distinct clusters circled in red, black, and blue (with centers around (3,–4), (– 6,18), and (2.5,18), respectively).

References [1] Ester, M., H.-P. Kriegel, J. Sander, and X. Xiaowei. “A density-based algorithm for discovering clusters in large spatial databases with noise.” In Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data Mining, 226-231. Portland, OR: AAAI Press, 1996.

See Also More About •

“Choose Cluster Analysis Method” on page 16-2

16-25

16

Cluster Analysis

Partition Data Using Spectral Clustering This topic provides an introduction to spectral clustering and an example that estimates the number of clusters and performs spectral clustering.

Introduction to Spectral Clustering Spectral clustering is a graph-based algorithm for partitioning data points, or observations, into k clusters. The Statistics and Machine Learning Toolbox function spectralcluster performs clustering on an input data matrix or on a similarity matrix on page 32-5178 of a similarity graph derived from the data. spectralcluster returns the cluster indices, a matrix containing k eigenvectors of the Laplacian matrix on page 32-5179, and a vector of eigenvalues corresponding to the eigenvectors. spectralcluster requires you to specify the number of clusters k. However, you can verify that your estimate for k is correct by using one of these methods: • Count the number of zero eigenvalues of the Laplacian matrix. The multiplicity of the zero eigenvalues is an indicator of the number of clusters in your data. • Find the number of connected components in your similarity matrix by using the MATLAB function conncomp.

Algorithm Description Spectral clustering is a graph-based algorithm for finding k arbitrarily shaped clusters in data. The technique involves representing the data in a low dimension. In the low dimension, clusters in the data are more widely separated, enabling you to use algorithms such as k-means or k-medoids clustering. This low dimension is based on the eigenvectors corresponding to the k smallest eigenvalues of a Laplacian matrix. A Laplacian matrix is one way of representing a similarity graph that models the local neighborhood relationships between data points as an undirected graph. The spectral clustering algorithm derives a similarity matrix of a similarity graph from your data, finds the Laplacian matrix, and uses the Laplacian matrix to find k eigenvectors for splitting the similarity graph into k partitions. You can use spectral clustering when you know the number of clusters, but the algorithm also provides a way to estimate the number of clusters in your data. By default, the algorithm for spectralcluster computes the normalized random-walk Laplacian matrix using the method described by Shi-Malik [1]. spectralcluster also supports the unnormalized Laplacian matrix and the normalized symmetric Laplacian matrix which uses the NgJordan-Weiss method [2]. The spectralcluster function implements clustering as follows: 1

For each data point in X, define a local neighborhood using either the radius search method or nearest neighbor method, as specified by the 'SimilarityGraph' name-value pair argument (see “Similarity Graph” on page 32-5178). Then, find the pairwise distances Disti, j for all points i and j in the neighborhood.

2

Convert the distances to similarity measures using the kernel transformation Disti, j 2 . The matrix S is the similarity matrix on page 32-5178, and σ is the scale Si, j = exp − σ factor for the kernel, as specified using the 'KernelScale' name-value pair argument.

3

16-26

Calculate the unnormalized Laplacian matrix on page 32-5179 L , the normalized random-walk Laplacian matrix Lrw, or the normalized symmetric Laplacian matrix Ls, depending on the value of the 'LaplacianNormalization' name-value pair argument.

Partition Data Using Spectral Clustering

4

Create a matrix V ∈ ℝn × k containing columns v1, …, vk, where the columns are the k eigenvectors that correspond to the k smallest eigenvalues of the Laplacian matrix. If using Ls, normalize each row of V to have unit length.

5

Treating each row of V as a point, cluster the n points using k-means clustering (default) or kmedoids clustering, as specified by the 'ClusterMethod' name-value pair argument.

6

Assign the original points in X to the same clusters as their corresponding rows in V.

Estimate Number of Clusters and Perform Spectral Clustering This example demonstrates two approaches for performing spectral clustering. • The first approach estimates the number of clusters using the eigenvalues of the Laplacian matrix and performs spectral clustering on the data set. • The second approach estimates the number of clusters using the similarity graph and performs spectral clustering on the similarity matrix. Generate Sample Data Randomly generate a sample data set with three well-separated clusters, each containing 20 points. rng('default'); % For reproducibility n = 20; X = [randn(n,2)*0.5+3; randn(n,2)*0.5; randn(n,2)*0.5-3];

Perform Spectral Clustering on Data Estimate the number of clusters in the data by using the eigenvalues of the Laplacian matrix, and perform spectral clustering on the data set. Compute the five smallest eigenvalues (in magnitude) of the Laplacian matrix by using the spectralcluster function. By default, the function uses the normalized random-walk Laplacian matrix. [~,V_temp,D_temp] = spectralcluster(X,5) V_temp = 60×5 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 ⋮

0.2236 0.2236 0.2236 0.2236 0.2236 0.2236 0.2236 0.2236 0.2236 0.2236

-0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000

-0.1534 0.3093 -0.2225 -0.1776 -0.1331 -0.2176 -0.1967 0.0088 0.2844 0.3275

-0.0000 0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0000 0.0000 0.0000 0.0000

D_temp = 5×1 -0.0000

16-27

16

Cluster Analysis

-0.0000 -0.0000 0.0876 0.1653

Only the first three eigenvalues are approximately zero. The number of zero eigenvalues is a good indicator of the number of connected components in a similarity graph and, therefore, is a good estimate of the number of clusters in your data. So, k=3 is a good estimate of the number of clusters in X. Perform spectral clustering on observations by using the spectralcluster function. Specify k=3 clusters. k = 3; [idx1,V,D] = spectralcluster(X,k) idx1 = 60×1 1 1 1 1 1 1 1 1 1 1



V = 60×3 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 ⋮

-0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236 -0.2236

0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

D = 3×1 10-16 × -0.1031 -0.1601 -0.3754

Elements of D correspond to the three smallest eigenvalues of the Laplacian matrix. The columns of V contain the eigenvectors corresponding to the eigenvalues in D. For well-separated clusters, the eigenvectors are indicator vectors. The eigenvectors have values of zero (or close to zero) for points 16-28

Partition Data Using Spectral Clustering

that do not belong to a particular cluster, and nonzero values for points that belong to a particular cluster. Visualize the result of clustering. gscatter(X(:,1),X(:,2),idx1);

The spectralcluster function correctly identifies the three clusters in the data set. Instead of using the spectralcluster function again, you can pass V_temp to the kmeans function to cluster the data points. idx2 = kmeans(V_temp(:,1:3),3); gscatter(X(:,1),X(:,2),idx2);

16-29

16

Cluster Analysis

The order of cluster assignments in idx1 and idx2 is different even though the data points are clustered in the same way. Perform Spectral Clustering on Similarity Matrix Estimate the number of clusters using the similarity graph and perform spectral clustering on the similarity matrix. Find the distance between each pair of observations in X by using the pdist and squareform functions with the default Euclidean distance metric. dist_temp = pdist(X); dist = squareform(dist_temp);

Construct the similarity matrix from the pairwise distance and confirm that the similarity matrix is symmetric. S = exp(-dist.^2); issymmetric(S) ans = logical 1

Limit the similarity values to 0.5 so that the similarity graph connects only points whose pairwise distances are smaller than the search radius. 16-30

Partition Data Using Spectral Clustering

S_eps = S; S_eps(S_eps=threshold(1) & P(:,1) 2 . 5 and Y = 0 otherwise. rng(1,'twister') % For reproducibility X = rand(2000,20); Y = sum(X(:,1:5),2) > 2.5;

In addition, to add noise to the results, randomly switch 10% of the classifications. idx = randsample(2000,200); Y(idx) = ~Y(idx);

Independent Test Set Create independent training and test sets of data. Use 70% of the data for a training set by calling cvpartition using the holdout option. cvpart = cvpartition(Y,'holdout',0.3); Xtrain = X(training(cvpart),:); Ytrain = Y(training(cvpart),:); Xtest = X(test(cvpart),:); Ytest = Y(test(cvpart),:);

Create a bagged classification ensemble of 200 trees from the training data. t = templateTree('Reproducible',true); % For reproducibility of random predictor selections bag = fitcensemble(Xtrain,Ytrain,'Method','Bag','NumLearningCycles',200,'Learners',t)

bag = classreg.learning.classif.ClassificationBaggedEnsemble ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [0 1] ScoreTransform: 'none' NumObservations: 1400 NumTrained: 200 Method: 'Bag' LearnerNames: {'Tree'} ReasonForTermination: 'Terminated normally after completing the requested number of training FitInfo: []

18-65

18

Nonparametric Supervised Learning

FitInfoDescription: FResample: Replace: UseObsForLearner:

'None' 1 1 [1400x200 logical]

Properties, Methods

Plot the loss (misclassification) of the test data as a function of the number of trained trees in the ensemble. figure plot(loss(bag,Xtest,Ytest,'mode','cumulative')) xlabel('Number of trees') ylabel('Test classification error')

Cross Validation Generate a five-fold cross-validated bagged ensemble. cv = fitcensemble(X,Y,'Method','Bag','NumLearningCycles',200,'Kfold',5,'Learners',t) cv = classreg.learning.partition.ClassificationPartitionedEnsemble CrossValidatedModel: 'Bag' PredictorNames: {1x20 cell} ResponseName: 'Y'

18-66

Test Ensemble Quality

NumObservations: KFold: Partition: NumTrainedPerFold: ClassNames: ScoreTransform:

2000 5 [1x1 cvpartition] [200 200 200 200 200] [0 1] 'none'

Properties, Methods

Examine the cross-validation loss as a function of the number of trees in the ensemble. figure plot(loss(bag,Xtest,Ytest,'mode','cumulative')) hold on plot(kfoldLoss(cv,'mode','cumulative'),'r.') hold off xlabel('Number of trees') ylabel('Classification error') legend('Test','Cross-validation','Location','NE')

Cross validating gives comparable estimates to those of the independent set. Out-of-Bag Estimates Generate the loss curve for out-of-bag estimates, and plot it along with the other curves. 18-67

18

Nonparametric Supervised Learning

figure plot(loss(bag,Xtest,Ytest,'mode','cumulative')) hold on plot(kfoldLoss(cv,'mode','cumulative'),'r.') plot(oobLoss(bag,'mode','cumulative'),'k--') hold off xlabel('Number of trees') ylabel('Classification error') legend('Test','Cross-validation','Out of bag','Location','NE')

The out-of-bag estimates are again comparable to those of the other methods.

See Also cvpartition | cvpartition | fitcensemble | fitrensemble | kfoldLoss | loss | oobLoss | resubLoss | resubLoss

Related Examples

18-68



“Framework for Ensemble Learning” on page 18-30



“Ensemble Algorithms” on page 18-38



“Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 18-119

Ensemble Regularization

Ensemble Regularization Regularization is a process of choosing fewer weak learners for an ensemble in a way that does not diminish predictive performance. Currently you can regularize regression ensembles. (You can also regularize a discriminant analysis classifier in a non-ensemble context; see “Regularize Discriminant Analysis Classifier” on page 20-21.) The regularize method finds an optimal set of learner weights αt that minimize



N

n=1



T

wng

t=1



T

αtht xn , yn + λ

t=1

αt .

Here • λ ≥ 0 is a parameter you provide, called the lasso parameter. • ht is a weak learner in the ensemble trained on N observations with predictors xn, responses yn, and weights wn. • g(f,y) = (f – y)2 is the squared error. The ensemble is regularized on the same (xn,yn,wn) data used for training, so



N

n=1



T

wng

t=1

αtht xn , yn

is the ensemble resubstitution error. The error is measured by mean squared error (MSE). If you use λ = 0, regularize finds the weak learner weights by minimizing the resubstitution MSE. Ensembles tend to overtrain. In other words, the resubstitution error is typically smaller than the true generalization error. By making the resubstitution error even smaller, you are likely to make the ensemble accuracy worse instead of improving it. On the other hand, positive values of λ push the magnitude of the αt coefficients to 0. This often improves the generalization error. Of course, if you choose λ too large, all the optimal coefficients are 0, and the ensemble does not have any accuracy. Usually you can find an optimal range for λ in which the accuracy of the regularized ensemble is better or comparable to that of the full ensemble without regularization. A nice feature of lasso regularization is its ability to drive the optimized coefficients precisely to 0. If a learner's weight αt is 0, this learner can be excluded from the regularized ensemble. In the end, you get an ensemble with improved accuracy and fewer learners.

Regularize a Regression Ensemble This example uses data for predicting the insurance risk of a car based on its many attributes. Load the imports-85 data into the MATLAB workspace. load imports-85;

Look at a description of the data to find the categorical variables and predictor names. Description

18-69

18

Nonparametric Supervised Learning

Description = 9x79 char array '1985 Auto Imports Database from the UCI repository ' 'http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.names' 'Variables have been reordered to place variables with numeric values (referred ' 'to as "continuous" on the UCI site) to the left and categorical values to the ' 'right. Specifically, variables 1:16 are: symboling, normalized-losses, ' 'wheel-base, length, width, height, curb-weight, engine-size, bore, stroke, ' 'compression-ratio, horsepower, peak-rpm, city-mpg, highway-mpg, and price. ' 'Variables 17:26 are: make, fuel-type, aspiration, num-of-doors, body-style, ' 'drive-wheels, engine-location, engine-type, num-of-cylinders, and fuel-system. '

The objective of this process is to predict the "symboling," the first variable in the data, from the other predictors. "symboling" is an integer from -3 (good insurance risk) to 3 (poor insurance risk). You could use a classification ensemble to predict this risk instead of a regression ensemble. When you have a choice between regression and classification, you should try regression first. Prepare the data for ensemble fitting. Y = X(:,1); X(:,1) = []; VarNames = {'normalized-losses' 'wheel-base' 'length' 'width' 'height' ... 'curb-weight' 'engine-size' 'bore' 'stroke' 'compression-ratio' ... 'horsepower' 'peak-rpm' 'city-mpg' 'highway-mpg' 'price' 'make' ... 'fuel-type' 'aspiration' 'num-of-doors' 'body-style' 'drive-wheels' ... 'engine-location' 'engine-type' 'num-of-cylinders' 'fuel-system'}; catidx = 16:25; % indices of categorical predictors

Create a regression ensemble from the data using 300 trees. ls = fitrensemble(X,Y,'Method','LSBoost','NumLearningCycles',300, ... 'LearnRate',0.1,'PredictorNames',VarNames, ... 'ResponseName','Symboling','CategoricalPredictors',catidx)

ls = classreg.learning.regr.RegressionEnsemble PredictorNames: {1x25 cell} ResponseName: 'Symboling' CategoricalPredictors: [16 17 18 19 20 21 22 23 24 25] ResponseTransform: 'none' NumObservations: 205 NumTrained: 300 Method: 'LSBoost' LearnerNames: {'Tree'} ReasonForTermination: 'Terminated normally after completing the requested number of training FitInfo: [300x1 double] FitInfoDescription: {2x1 cell} Regularization: [] Properties, Methods

The final line, Regularization, is empty ([]). To regularize the ensemble, you have to use the regularize method. cv = crossval(ls,'KFold',5); figure;

18-70

Ensemble Regularization

plot(kfoldLoss(cv,'Mode','Cumulative')); xlabel('Number of trees'); ylabel('Cross-validated MSE'); ylim([0.2,2])

It appears you might obtain satisfactory performance from a smaller ensemble, perhaps one containing from 50 to 100 trees. Call the regularize method to try to find trees that you can remove from the ensemble. By default, regularize examines 10 values of the lasso (Lambda) parameter spaced exponentially. ls = regularize(ls)

ls = classreg.learning.regr.RegressionEnsemble PredictorNames: {1x25 cell} ResponseName: 'Symboling' CategoricalPredictors: [16 17 18 19 20 21 22 23 24 25] ResponseTransform: 'none' NumObservations: 205 NumTrained: 300 Method: 'LSBoost' LearnerNames: {'Tree'} ReasonForTermination: 'Terminated normally after completing the requested number of training FitInfo: [300x1 double] FitInfoDescription: {2x1 cell} Regularization: [1x1 struct]

18-71

18

Nonparametric Supervised Learning

Properties, Methods

The Regularization property is no longer empty. Plot the resubstitution mean-squared error (MSE) and number of learners with nonzero weights against the lasso parameter. Separately plot the value at Lambda = 0. Use a logarithmic scale because the values of Lambda are exponentially spaced. figure; semilogx(ls.Regularization.Lambda,ls.Regularization.ResubstitutionMSE, ... 'bx-','Markersize',10); line([1e-3 1e-3],[ls.Regularization.ResubstitutionMSE(1) ... ls.Regularization.ResubstitutionMSE(1)],... 'Marker','x','Markersize',10,'Color','b'); r0 = resubLoss(ls); line([ls.Regularization.Lambda(2) ls.Regularization.Lambda(end)],... [r0 r0],'Color','r','LineStyle','--'); xlabel('Lambda'); ylabel('Resubstitution MSE'); annotation('textbox',[0.5 0.22 0.5 0.05],'String','unregularized ensemble', ... 'Color','r','FontSize',14,'LineStyle','none');

figure; loglog(ls.Regularization.Lambda,sum(ls.Regularization.TrainedWeights>0,1)); line([1e-3 1e-3],...

18-72

Ensemble Regularization

[sum(ls.Regularization.TrainedWeights(:,1)>0) ... sum(ls.Regularization.TrainedWeights(:,1)>0)],... 'marker','x','markersize',10,'color','b'); line([ls.Regularization.Lambda(2) ls.Regularization.Lambda(end)],... [ls.NTrained ls.NTrained],... 'color','r','LineStyle','--'); xlabel('Lambda'); ylabel('Number of learners'); annotation('textbox',[0.3 0.8 0.5 0.05],'String','unregularized ensemble',... 'color','r','FontSize',14,'LineStyle','none');

The resubstitution MSE values are likely to be overly optimistic. To obtain more reliable estimates of the error associated with various values of Lambda, cross validate the ensemble using cvshrink. Plot the resulting cross-validation loss (MSE) and number of learners against Lambda. rng(0,'Twister') % for reproducibility [mse,nlearn] = cvshrink(ls,'Lambda',ls.Regularization.Lambda,'KFold',5); Warning: Some folds do not have any trained weak learners. figure; semilogx(ls.Regularization.Lambda,ls.Regularization.ResubstitutionMSE, ... 'bx-','Markersize',10); hold on; semilogx(ls.Regularization.Lambda,mse,'ro-','Markersize',10); hold off; xlabel('Lambda');

18-73

18

Nonparametric Supervised Learning

ylabel('Mean squared error'); legend('resubstitution','cross-validation','Location','NW'); line([1e-3 1e-3],[ls.Regularization.ResubstitutionMSE(1) ... ls.Regularization.ResubstitutionMSE(1)],... 'Marker','x','Markersize',10,'Color','b','HandleVisibility','off'); line([1e-3 1e-3],[mse(1) mse(1)],'Marker','o',... 'Markersize',10,'Color','r','LineStyle','--','HandleVisibility','off');

figure; loglog(ls.Regularization.Lambda,sum(ls.Regularization.TrainedWeights>0,1)); hold; Current plot held loglog(ls.Regularization.Lambda,nlearn,'r--'); hold off; xlabel('Lambda'); ylabel('Number of learners'); legend('resubstitution','cross-validation','Location','NE'); line([1e-3 1e-3],... [sum(ls.Regularization.TrainedWeights(:,1)>0) ... sum(ls.Regularization.TrainedWeights(:,1)>0)],... 'Marker','x','Markersize',10,'Color','b','HandleVisibility','off'); line([1e-3 1e-3],[nlearn(1) nlearn(1)],'marker','o',... 'Markersize',10,'Color','r','LineStyle','--','HandleVisibility','off');

18-74

Ensemble Regularization

Examining the cross-validated error shows that the cross-validation MSE is almost flat for Lambda up to a bit over 1e-2. Examine ls.Regularization.Lambda to find the highest value that gives MSE in the flat region (up to a bit over 1e-2). jj = 1:length(ls.Regularization.Lambda); [jj;ls.Regularization.Lambda] ans = 2×10 1.0000 0

2.0000 0.0019

3.0000 0.0045

4.0000 0.0107

5.0000 0.0254

6.0000 0.0602

7.0000 0.1428

8.0000 0.3387

9.0000 0.8033

Element 5 of ls.Regularization.Lambda has value 0.0254, the largest in the flat range. Reduce the ensemble size using the shrink method. shrink returns a compact ensemble with no training data. The generalization error for the new compact ensemble was already estimated by cross validation in mse(5). cmp = shrink(ls,'weightcolumn',5) cmp = classreg.learning.regr.CompactRegressionEnsemble PredictorNames: {1x25 cell} ResponseName: 'Symboling' CategoricalPredictors: [16 17 18 19 20 21 22 23 24 25]

18-75

10.0 1.9

18

Nonparametric Supervised Learning

ResponseTransform: 'none' NumTrained: 8 Properties, Methods

The number of trees in the new ensemble has notably reduced from the 300 in ls. Compare the sizes of the ensembles. sz(1) = whos('cmp'); sz(2) = whos('ls'); [sz(1).bytes sz(2).bytes] ans = 1×2 89489

3175917

The size of the reduced ensemble is a fraction of the size of the original. Note that your ensemble sizes can vary depending on your operating system. Compare the MSE of the reduced ensemble to that of the original ensemble. figure; plot(kfoldLoss(cv,'mode','cumulative')); hold on plot(cmp.NTrained,mse(5),'ro','MarkerSize',10); xlabel('Number of trees'); ylabel('Cross-validated MSE'); legend('unregularized ensemble','regularized ensemble',... 'Location','NE'); hold off

18-76

Ensemble Regularization

The reduced ensemble gives low loss while using many fewer trees.

See Also crossval | cvshrink | fitrensemble | kfoldLoss | regularize | resubLoss | shrink

Related Examples •

“Regularize Discriminant Analysis Classifier” on page 20-21



“Framework for Ensemble Learning” on page 18-30



“Ensemble Algorithms” on page 18-38

18-77

18

Nonparametric Supervised Learning

Classification with Imbalanced Data This example shows how to perform classification when one class has many more observations than another. You use the RUSBoost algorithm first, because it is designed to handle this case. Another way to handle imbalanced data is to use the name-value pair arguments 'Prior' or 'Cost'. For details, see “Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 18-83. This example uses the "Cover type" data from the UCI machine learning archive, described in https:// archive.ics.uci.edu/ml/datasets/Covertype. The data classifies types of forest (ground cover), based on predictors such as elevation, soil type, and distance to water. The data has over 500,000 observations and over 50 predictors, so training and using a classifier is time consuming. Blackard and Dean [1] describe a neural net classification of this data. They quote a 70.6% classification accuracy. RUSBoost obtains over 81% classification accuracy. Obtain the data Import the data into your workspace. Extract the last data column into a variable named Y. gunzip('https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz') load covtype.data Y = covtype(:,end); covtype(:,end) = [];

Examine the response data tabulate(Y) Value 1 2 3 4 5 6 7

Count 211840 283301 35754 2747 9493 17367 20510

Percent 36.46% 48.76% 6.15% 0.47% 1.63% 2.99% 3.53%

There are hundreds of thousands of data points. Those of class 4 are less than 0.5% of the total. This imbalance indicates that RUSBoost is an appropriate algorithm. Partition the data for quality assessment Use half the data to fit a classifier, and half to examine the quality of the resulting classifier. rng(10,'twister') % For reproducibility part = cvpartition(Y,'Holdout',0.5); istrain = training(part); % Data for fitting istest = test(part); % Data for quality assessment tabulate(Y(istrain)) Value 1 2 3 4 5

18-78

Count 105919 141651 17877 1374 4747

Percent 36.46% 48.76% 6.15% 0.47% 1.63%

Classification with Imbalanced Data

6 7

8684 10254

2.99% 3.53%

Create the ensemble Use deep trees for higher ensemble accuracy. To do so, set the trees to have maximal number of decision splits of N, where N is the number of observations in the training sample. Set LearnRate to 0.1 in order to achieve higher accuracy as well. The data is large, and, with deep trees, creating the ensemble is time consuming. N = sum(istrain); % Number of observations in the training sample t = templateTree('MaxNumSplits',N); tic rusTree = fitcensemble(covtype(istrain,:),Y(istrain),'Method','RUSBoost', ... 'NumLearningCycles',1000,'Learners',t,'LearnRate',0.1,'nprint',100); Training RUSBoost... Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners: Grown weak learners:

100 200 300 400 500 600 700 800 900 1000

toc Elapsed time is 242.836734 seconds.

Inspect the classification error Plot the classification error against the number of members in the ensemble. figure; tic plot(loss(rusTree,covtype(istest,:),Y(istest),'mode','cumulative')); toc Elapsed time is 164.470086 seconds. grid on; xlabel('Number of trees'); ylabel('Test classification error');

18-79

18

Nonparametric Supervised Learning

The ensemble achieves a classification error of under 20% using 116 or more trees. For 500 or more trees, the classification error decreases at a slower rate. Examine the confusion matrix for each class as a percentage of the true class. tic Yfit = predict(rusTree,covtype(istest,:)); toc Elapsed time is 132.353489 seconds. confusionchart(Y(istest),Yfit,'Normalization','row-normalized','RowSummary','row-normalized')

18-80

Classification with Imbalanced Data

All classes except class 2 have over 90% classification accuracy. But class 2 makes up close to half the data, so the overall accuracy is not that high. Compact the ensemble The ensemble is large. Remove the data using the compact method. cmpctRus = compact(rusTree); sz(1) = whos('rusTree'); sz(2) = whos('cmpctRus'); [sz(1).bytes sz(2).bytes] ans = 1×2 109 × 1.6579

0.9423

The compacted ensemble is about half the size of the original. Remove half the trees from cmpctRus. This action is likely to have minimal effect on the predictive performance, based on the observation that 500 out of 1000 trees give nearly optimal accuracy. cmpctRus = removeLearners(cmpctRus,[500:1000]); sz(3) = whos('cmpctRus'); sz(3).bytes

18-81

18

Nonparametric Supervised Learning

ans = 452868660

The reduced compact ensemble takes about a quarter of the memory of the full ensemble. Its overall loss rate is under 19%: L = loss(cmpctRus,covtype(istest,:),Y(istest)) L = 0.1833

The predictive accuracy on new data might differ, because the ensemble accuracy might be biased. The bias arises because the same data used for assessing the ensemble was used for reducing the ensemble size. To obtain an unbiased estimate of requisite ensemble size, you should use cross validation. However, that procedure is time consuming.

References [1] Blackard, J. A. and D. J. Dean. "Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables". Computers and Electronics in Agriculture Vol. 24, Issue 3, 1999, pp. 131–151.

See Also compact | confusionchart | cvpartition | cvpartition | fitcensemble | loss | predict | removeLearners | tabulate | templateTree | test | training

Related Examples

18-82



“Surrogate Splits” on page 18-90



“Ensemble Algorithms” on page 18-38



“Test Ensemble Quality” on page 18-65



“Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 18-83



“LPBoost and TotalBoost for Small Ensembles” on page 18-95



“Tune RobustBoost” on page 18-100

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles In many applications, you might prefer to treat classes in your data asymmetrically. For example, the data might have many more observations of one class than any other. Or misclassifying observations of one class has more severe consequences than misclassifying observations of another class. In such situations, you can either use the RUSBoost algorithm (specify 'Method' as 'RUSBoost') or use the name-value pair argument 'Prior' or 'Cost' of fitcensemble. If some classes are underrepresented or overrepresented in your training set, use either the 'Prior' name-value pair argument or the RUSBoost algorithm. For example, suppose you obtain your training data by simulation. Because simulating class A is more expensive than simulating class B, you choose to generate fewer observations of class A and more observations of class B. The expectation, however, is that class A and class B are mixed in a different proportion in real (nonsimulated) situations. In this case, use 'Prior' to set prior probabilities for class A and B approximately to the values you expect to observe in a real situation. The fitcensemble function normalizes prior probabilities to make them add up to 1. Multiplying all prior probabilities by the same positive factor does not affect the result of classification. Another way to handle imbalanced data is to use the RUSBoost algorithm ('Method','RUSBoost'). You do not need to adjust the prior probabilities when using this algorithm. For details, see “Random Undersampling Boosting” on page 18-50 and “Classification with Imbalanced Data” on page 18-78. If classes are adequately represented in the training data but you want to treat them asymmetrically, use the 'Cost' name-value pair argument. Suppose you want to classify benign and malignant tumors in cancer patients. Failure to identify a malignant tumor (false negative) has far more severe consequences than misidentifying benign as malignant (false positive). You should assign high cost to misidentifying malignant as benign and low cost to misidentifying benign as malignant. You must pass misclassification costs as a square matrix with nonnegative elements. Element C(i,j) of this matrix is the cost of classifying an observation into class j if the true class is i. The diagonal elements C(i,i) of the cost matrix must be 0. For the previous example, you can choose malignant tumor to be class 1 and benign tumor to be class 2. Then you can set the cost matrix to 0 c 10 where c > 1 is the cost of misidentifying a malignant tumor as benign. Costs are relative—multiplying all costs by the same positive factor does not affect the result of classification. If you have only two classes, fitcensemble adjusts their prior probabilities using P i = Ci jPifor class i = 1,2 and j ≠ i. Pi are prior probabilities either passed into fitcensemble or computed from class frequencies in the training data, and P i are adjusted prior probabilities. Then fitcensemble uses the default cost matrix 01 10 and these adjusted probabilities for training its weak learners. Manipulating the cost matrix is thus equivalent to manipulating the prior probabilities. If you have three or more classes, fitcensemble also converts input costs into adjusted prior probabilities. This conversion is more complex. First, fitcensemble attempts to solve a matrix 18-83

18

Nonparametric Supervised Learning

equation described in Zhou and Liu [1]. If it fails to find a solution, fitcensemble applies the “average cost” adjustment described in Breiman et al. [2]. For more information, see Zadrozny, Langford, and Abe [3].

Train Ensemble With Unequal Classification Costs This example shows how to train an ensemble of classification trees with unequal classification costs. This example uses data on patients with hepatitis to see if they live or die as a result of the disease. The data set is described at UCI Machine Learning Data Repository. Read the hepatitis data set from the UCI repository as a character array. Then convert the result to a cell array of character vectors using textscan. Specify a cell array of character vectors containing the variable names. hepatitis = textscan(urlread(['https://archive.ics.uci.edu/ml/' ... 'machine-learning-databases/hepatitis/hepatitis.data']),... '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f','TreatAsEmpty','?',... 'Delimiter',','); size(hepatitis) ans = 1×2 1

20

VarNames = {'dieOrLive' 'age' 'sex' 'steroid' 'antivirals' 'fatigue' ... 'malaise' 'anorexia' 'liverBig' 'liverFirm' 'spleen' ... 'spiders' 'ascites' 'varices' 'bilirubin' 'alkPhosphate' 'sgot' ... 'albumin' 'protime' 'histology'};

hepatitis is a 1-by-20 cell array of character vectors. The cells correspond to the response (liveOrDie) and 19 heterogeneous predictors. Specify a numeric matrix containing the predictors and a cell vector containing 'Die' and 'Live', which are response categories. The response contains two values: 1 indicates that a patient died, and 2 indicates that a patient lived. Specify a cell array of character vectors for the response using the response categories. The first variable in hepatitis contains the response. X = cell2mat(hepatitis(2:end)); ClassNames = {'Die' 'Live'}; Y = ClassNames(hepatitis{:,1});

X is a numeric matrix containing the 19 predictors. Y is a cell array of character vectors containing the response. Inspect the data for missing values. figure barh(sum(isnan(X),1)/size(X,1)) h = gca; h.YTick = 1:numel(VarNames) - 1; h.YTickLabel = VarNames(2:end); ylabel('Predictor') xlabel('Fraction of missing values')

18-84

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles

Most predictors have missing values, and one has nearly 45% of the missing values. Therefore, use decision trees with surrogate splits for better accuracy. Because the data set is small, training time with surrogate splits should be tolerable. Create a classification tree template that uses surrogate splits. rng(0,'twister') % For reproducibility t = templateTree('surrogate','all');

Examine the data or the description of the data to see which predictors are categorical. X(1:5,:) ans = 5×19 30.0000 50.0000 78.0000 31.0000 34.0000

2.0000 1.0000 1.0000 1.0000 1.0000

1.0000 1.0000 2.0000 NaN 2.0000

2.0000 2.0000 2.0000 1.0000 2.0000

2.0000 1.0000 1.0000 2.0000 2.0000

2.0000 2.0000 2.0000 2.0000 2.0000

2.0000 2.0000 2.0000 2.0000 2.0000

1.0000 1.0000 2.0000 2.0000 2.0000

2.0000 2.0000 2.0000 2.0000 2.0000

It appears that predictors 2 through 13 are categorical, as well as predictor 19. You can confirm this inference using the data set description at UCI Machine Learning Data Repository. List the categorical variables. catIdx = [2:13,19];

18-85

2.0 2.0 2.0 2.0 2.0

18

Nonparametric Supervised Learning

Create a cross-validated ensemble using 150 learners and the GentleBoost algorithm. Ensemble = fitcensemble(X,Y,'Method','GentleBoost', ... 'NumLearningCycles',150,'Learners',t,'PredictorNames',VarNames(2:end), ... 'LearnRate',0.1,'CategoricalPredictors',catIdx,'KFold',5); figure plot(kfoldLoss(Ensemble,'Mode','cumulative','LossFun','exponential')) xlabel('Number of trees') ylabel('Cross-validated exponential loss')

Inspect the confusion matrix to see which patients the ensemble predicts correctly. [yFit,sFit] = kfoldPredict(Ensemble); confusionchart(Y,yFit)

18-86

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles

Of the 123 patient who live, the ensemble predicts correctly that 112 will live. But for the 32 patients who die of hepatitis, the ensemble only predicts correctly that about half will die of hepatitis. There are two types of error in the predictions of the ensemble: • Predicting that the patient lives, but the patient dies • Predicting that the patient dies, but the patient lives Suppose you believe that the first error is five times worse than the second. Create a new classification cost matrix that reflects this belief. cost.ClassNames = ClassNames; cost.ClassificationCosts = [0 5; 1 0];

Create a new cross-validated ensemble using cost as the misclassification cost, and inspect the resulting confusion matrix. EnsembleCost = fitcensemble(X,Y,'Method','GentleBoost', ... 'NumLearningCycles',150,'Learners',t,'PredictorNames',VarNames(2:end), ... 'LearnRate',0.1,'CategoricalPredictors',catIdx,'KFold',5,'Cost',cost); [yFitCost,sFitCost] = kfoldPredict(EnsembleCost); confusionchart(Y,yFitCost)

18-87

18

Nonparametric Supervised Learning

As expected, the new ensemble does a better job classifying the patients who die. Somewhat surprisingly, the new ensemble also does a better job classifying the patients who live, though the result is not statistically significantly better. The results of the cross validation are random, so this result is simply a statistical fluctuation. The result seems to indicate that the classification of patients who live is not very sensitive to the cost.

References [1] Zhou, Z.-H. and X.-Y. Liu. “On Multi-Class Cost-Sensitive Learning.”Computational Intelligence. Vol. 26, Issue 3, 2010, pp. 232–257 CiteSeerX. [2] Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Boca Raton, FL: Chapman & Hall, 1984. [3] Zadrozny, B., J. Langford, and N. Abe. “Cost-Sensitive Learning by Cost-Proportionate Example Weighting.” Third IEEE International Conference on Data Mining, 435–442. 2003.

See Also confusionchart | fitcensemble | kfoldLoss | kfoldPredict | templateTree

Related Examples • 18-88

“Surrogate Splits” on page 18-90

Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles



“Ensemble Algorithms” on page 18-38



“Test Ensemble Quality” on page 18-65



“Classification with Imbalanced Data” on page 18-78



“LPBoost and TotalBoost for Small Ensembles” on page 18-95



“Tune RobustBoost” on page 18-100

18-89

18

Nonparametric Supervised Learning

Surrogate Splits When the value of the optimal split predictor for an observation is missing, if you specify to use surrogate splits, the software sends the observation to the left or right child node using the best surrogate predictor. When you have missing data, trees and ensembles of trees with surrogate splits give better predictions. This example shows how to improve the accuracy of predictions for data with missing values by using decision trees with surrogate splits. Load Sample Data Load the ionosphere data set. load ionosphere

Partition the data set into training and test sets. Hold out 30% of the data for testing. rng('default') % For reproducibility cv = cvpartition(Y,'Holdout',0.3);

Identify the training and testing data. Xtrain = X(training(cv),:); Ytrain = Y(training(cv)); Xtest = X(test(cv),:); Ytest = Y(test(cv));

Suppose half of the values in the test set are missing. Set half of the values in the test set to NaN. Xtest(rand(size(Xtest))>0.5) = NaN;

Train Random Forest Train a random forest of 150 classification trees without surrogate splits. templ = templateTree('Reproducible',true); % For reproducibility of random predictor selections Mdl = fitcensemble(Xtrain,Ytrain,'Method','Bag','NumLearningCycles',150,'Learners',templ);

Create a decision tree template that uses surrogate splits. A tree using surrogate splits does not discard the entire observation when it includes missing data in some predictors. templS = templateTree('Surrogate','On','Reproducible',true);

Train a random forest using the template templS. Mdls = fitcensemble(Xtrain,Ytrain,'Method','Bag','NumLearningCycles',150,'Learners',templS);

Test Accuracy Test the accuracy of predictions with and without surrogate splits. Predict responses and create confusion matrix charts using both approaches. Ytest_pred = predict(Mdl,Xtest); figure cm = confusionchart(Ytest,Ytest_pred); cm.Title = 'Model Without Surrogates';

18-90

Surrogate Splits

Ytest_preds = predict(Mdls,Xtest); figure cms = confusionchart(Ytest,Ytest_preds); cms.Title = 'Model with Surrogates';

18-91

18

Nonparametric Supervised Learning

All off-diagonal elements on the confusion matrix represent misclassified data. A good classifier yields a confusion matrix that looks dominantly diagonal. In this case, the classification error is lower for the model trained with surrogate splits. Estimate cumulative classification errors. Specify 'Mode','Cumulative' when estimating classification errors by using the loss function. The loss function returns a vector in which element J indicates the error using the first J learners. figure plot(loss(Mdl,Xtest,Ytest,'Mode','Cumulative')) hold on plot(loss(Mdls,Xtest,Ytest,'Mode','Cumulative'),'r--') legend('Trees without surrogate splits','Trees with surrogate splits') xlabel('Number of trees') ylabel('Test classification error')

18-92

Surrogate Splits

The error value decreases as the number of trees increases, which indicates good performance. The classification error is lower for the model trained with surrogate splits. Check the statistical significance of the difference in results with by using compareHoldout. This function uses the McNemar test. [~,p] = compareHoldout(Mdls,Mdl,Xtest,Xtest,Ytest,'Alternative','greater') p = 0.1509

The low p-value indicates that the ensemble with surrogate splits is better in a statistically significant manner. Estimate Predictor Importance Predictor importance estimates can vary depending on whether or not a tree uses surrogate splits. After estimating predictor importance, you can exclude unimportant predictors and train a model again. Eliminating unimportant predictors saves time and memory for predictions, and makes predictions easier to understand. Estimate predictor importance measures by permuting out-of-bag observations. Then, find the five most important predictors. imp = oobPermutedPredictorImportance(Mdl); [~,ind] = maxk(imp,5) ind = 1×5

18-93

18

Nonparametric Supervised Learning

3

5

27

7

8

imps = oobPermutedPredictorImportance(Mdls); [~,inds] = maxk(imps,5) inds = 1×5 5

3

7

27

8

The five most important predictors are the same, but the orders of importance are different. If the training data includes many predictors and you want to analyze predictor importance, then specify 'NumVariablesToSample' of the templateTree function as 'all' for the tree learners of the ensemble. Otherwise, the software might not select some predictors, underestimating their importance. For an example, see “Select Predictors for Random Forests” on page 18-59.

See Also compareHoldout | fitcensemble | fitrensemble

Related Examples

18-94



“Ensemble Algorithms” on page 18-38



“Test Ensemble Quality” on page 18-65



“Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 18-83



“Classification with Imbalanced Data” on page 18-78



“LPBoost and TotalBoost for Small Ensembles” on page 18-95



“Tune RobustBoost” on page 18-100

LPBoost and TotalBoost for Small Ensembles

LPBoost and TotalBoost for Small Ensembles This example shows how to obtain the benefits of the LPBoost and TotalBoost algorithms. These algorithms share two beneficial characteristics: • They are self-terminating, which means you do not have to figure out how many members to include. • They produce ensembles with some very small weights, enabling you to safely remove ensemble members. Load the data Load the ionosphere data set. load ionosphere

Create the classification ensembles Create ensembles for classifying the ionosphere data using the LPBoost, TotalBoost, and, for comparison, AdaBoostM1 algorithms. It is hard to know how many members to include in an ensemble. For LPBoost and TotalBoost, try using 500. For comparison, also use 500 for AdaBoostM1. The default weak learners for boosting methods are decision trees with the MaxNumSplits property set to 10. These trees tend to fit better than tree stumps (with 1 maximum split) and may overfit more. Therefore, to prevent overfitting, use tree stumps as weak learners for the ensembles. rng('default') % For reproducibility T = 500; treeStump = templateTree('MaxNumSplits',1); adaStump = fitcensemble(X,Y,'Method','AdaBoostM1','NumLearningCycles',T,'Learners',treeStump); totalStump = fitcensemble(X,Y,'Method','TotalBoost','NumLearningCycles',T,'Learners',treeStump); lpStump = fitcensemble(X,Y,'Method','LPBoost','NumLearningCycles',T,'Learners',treeStump); figure plot(resubLoss(adaStump,'Mode','Cumulative')); hold on plot(resubLoss(totalStump,'Mode','Cumulative'),'r'); plot(resubLoss(lpStump,'Mode','Cumulative'),'g'); hold off xlabel('Number of stumps'); ylabel('Training error'); legend('AdaBoost','TotalBoost','LPBoost','Location','NE');

18-95

18

Nonparametric Supervised Learning

All three algorithms achieve perfect prediction on the training data after a while. Examine the number of members in all three ensembles. [adaStump.NTrained totalStump.NTrained lpStump.NTrained] ans = 1×3 500

52

76

AdaBoostM1 trained all 500 members. The other two algorithms stopped training early. Cross validate the ensembles Cross validate the ensembles to better determine ensemble accuracy. cvlp = crossval(lpStump,'KFold',5); cvtotal = crossval(totalStump,'KFold',5); cvada = crossval(adaStump,'KFold',5); figure plot(kfoldLoss(cvada,'Mode','Cumulative')); hold on plot(kfoldLoss(cvtotal,'Mode','Cumulative'),'r'); plot(kfoldLoss(cvlp,'Mode','Cumulative'),'g'); hold off xlabel('Ensemble size');

18-96

LPBoost and TotalBoost for Small Ensembles

ylabel('Cross-validated error'); legend('AdaBoost','TotalBoost','LPBoost','Location','NE');

The results show that each boosting algorithm achieves a loss of 10% or lower with 50 ensemble members. Compact and remove ensemble members To reduce the ensemble sizes, compact them, and then use removeLearners. The question is, how many learners should you remove? The cross-validated loss curves give you one measure. For another, examine the learner weights for LPBoost and TotalBoost after compacting. cada = compact(adaStump); clp = compact(lpStump); ctotal = compact(totalStump); figure subplot(2,1,1) plot(clp.TrainedWeights) title('LPBoost weights') subplot(2,1,2) plot(ctotal.TrainedWeights) title('TotalBoost weights')

18-97

18

Nonparametric Supervised Learning

Both LPBoost and TotalBoost show clear points where the ensemble member weights become negligible. Remove the unimportant ensemble members. cada = removeLearners(cada,150:cada.NTrained); clp = removeLearners(clp,60:clp.NTrained); ctotal = removeLearners(ctotal,40:ctotal.NTrained);

Check that removing these learners does not affect ensemble accuracy on the training data. [loss(cada,X,Y) loss(clp,X,Y) loss(ctotal,X,Y)] ans = 1×3 0

0

0

Check the resulting compact ensemble sizes. s(1) = whos('cada'); s(2) = whos('clp'); s(3) = whos('ctotal'); s.bytes ans = 565644 ans = 225950

18-98

LPBoost and TotalBoost for Small Ensembles

ans = 150470

The sizes of the compact ensembles are approximately proportional to the number of members in each.

See Also compact | crossval | fitcensemble | kfoldLoss | loss | removeLearners | resubLoss

Related Examples •

“Surrogate Splits” on page 18-90



“Ensemble Algorithms” on page 18-38



“Test Ensemble Quality” on page 18-65



“Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 18-83



“Classification with Imbalanced Data” on page 18-78



“Tune RobustBoost” on page 18-100

18-99

18

Nonparametric Supervised Learning

Tune RobustBoost The RobustBoost algorithm can make good classification predictions even when the training data has noise. However, the default RobustBoost parameters can produce an ensemble that does not predict well. This example shows one way of tuning the parameters for better predictive accuracy. Generate data with label noise. This example has twenty uniform random numbers per observation, and classifies the observation as 1 if the sum of the first five numbers exceeds 2.5 (so is larger than average), and 0 otherwise: rng(0,'twister') % for reproducibility Xtrain = rand(2000,20); Ytrain = sum(Xtrain(:,1:5),2) > 2.5;

To add noise, randomly switch 10% of the classifications: idx = randsample(2000,200); Ytrain(idx) = ~Ytrain(idx);

Create an ensemble with AdaBoostM1 for comparison purposes: ada = fitcensemble(Xtrain,Ytrain,'Method','AdaBoostM1', ... 'NumLearningCycles',300,'Learners','Tree','LearnRate',0.1);

Create an ensemble with RobustBoost. Because the data has 10% incorrect classification, perhaps an error goal of 15% is reasonable. rb1 = fitcensemble(Xtrain,Ytrain,'Method','RobustBoost', ... 'NumLearningCycles',300,'Learners','Tree','RobustErrorGoal',0.15, ... 'RobustMaxMargin',1);

Note that if you set the error goal to a high enough value, then the software returns an error. Create an ensemble with very optimistic error goal, 0.01: rb2 = fitcensemble(Xtrain,Ytrain,'Method','RobustBoost', ... 'NumLearningCycles',300,'Learners','Tree','RobustErrorGoal',0.01);

Compare the resubstitution error of the three ensembles: figure plot(resubLoss(rb1,'Mode','Cumulative')); hold on plot(resubLoss(rb2,'Mode','Cumulative'),'r--'); plot(resubLoss(ada,'Mode','Cumulative'),'g.'); hold off; xlabel('Number of trees'); ylabel('Resubstitution error'); legend('ErrorGoal=0.15','ErrorGoal=0.01',... 'AdaBoostM1','Location','NE');

18-100

Tune RobustBoost

All the RobustBoost curves show lower resubstitution error than the AdaBoostM1 curve. The error goal of 0.01 curve shows the lowest resubstitution error over most of the range. Xtest = rand(2000,20); Ytest = sum(Xtest(:,1:5),2) > 2.5; idx = randsample(2000,200); Ytest(idx) = ~Ytest(idx); figure; plot(loss(rb1,Xtest,Ytest,'Mode','Cumulative')); hold on plot(loss(rb2,Xtest,Ytest,'Mode','Cumulative'),'r--'); plot(loss(ada,Xtest,Ytest,'Mode','Cumulative'),'g.'); hold off; xlabel('Number of trees'); ylabel('Test error'); legend('ErrorGoal=0.15','ErrorGoal=0.01',... 'AdaBoostM1','Location','NE');

18-101

18

Nonparametric Supervised Learning

The error curve for error goal 0.15 is lowest (best) in the plotted range. AdaBoostM1 has higher error than the curve for error goal 0.15. The curve for the too-optimistic error goal 0.01 remains substantially higher (worse) than the other algorithms for most of the plotted range.

See Also fitcensemble | loss | resubLoss

Related Examples

18-102



“Surrogate Splits” on page 18-90



“Ensemble Algorithms” on page 18-38



“Test Ensemble Quality” on page 18-65



“Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles” on page 18-83



“Classification with Imbalanced Data” on page 18-78



“LPBoost and TotalBoost for Small Ensembles” on page 18-95

Random Subspace Classification

Random Subspace Classification This example shows how to use a random subspace ensemble to increase the accuracy of classification. It also shows how to use cross validation to determine good parameters for both the weak learner template and the ensemble. Load the data Load the ionosphere data. This data has 351 binary responses to 34 predictors. load ionosphere; [N,D] = size(X) N = 351 D = 34 resp = unique(Y) resp = 2x1 cell {'b'} {'g'}

Choose the number of nearest neighbors Find a good choice for k, the number of nearest neighbors in the classifier, by cross validation. Choose the number of neighbors approximately evenly spaced on a logarithmic scale. rng(8000,'twister') % for reproducibility K = round(logspace(0,log10(N),10)); % number of neighbors cvloss = zeros(numel(K),1); for k=1:numel(K) knn = fitcknn(X,Y,... 'NumNeighbors',K(k),'CrossVal','On'); cvloss(k) = kfoldLoss(knn); end figure; % Plot the accuracy versus k semilogx(K,cvloss); xlabel('Number of nearest neighbors'); ylabel('10 fold classification error'); title('k-NN classification');

18-103

18

Nonparametric Supervised Learning

The lowest cross-validation error occurs for k = 2. Create the ensembles Create ensembles for 2-nearest neighbor classification with various numbers of dimensions, and examine the cross-validated loss of the resulting ensembles. This step takes a long time. To keep track of the progress, print a message as each dimension finishes. NPredToSample = round(linspace(1,D,10)); % linear spacing of dimensions cvloss = zeros(numel(NPredToSample),1); learner = templateKNN('NumNeighbors',2); for npred=1:numel(NPredToSample) subspace = fitcensemble(X,Y,'Method','Subspace','Learners',learner, ... 'NPredToSample',NPredToSample(npred),'CrossVal','On'); cvloss(npred) = kfoldLoss(subspace); fprintf('Random Subspace %i done.\n',npred); end Random Random Random Random Random Random Random

18-104

Subspace Subspace Subspace Subspace Subspace Subspace Subspace

1 2 3 4 5 6 7

done. done. done. done. done. done. done.

Random Subspace Classification

Random Subspace 8 done. Random Subspace 9 done. Random Subspace 10 done. figure; % plot the accuracy versus dimension plot(NPredToSample,cvloss); xlabel('Number of predictors selected at random'); ylabel('10 fold classification error'); title('k-NN classification with Random Subspace');

The ensembles that use five and eight predictors per learner have the lowest cross-validated error. The error rate for these ensembles is about 0.06, while the other ensembles have cross-validated error rates that are approximately 0.1 or more. Find a good ensemble size Find the smallest number of learners in the ensemble that still give good classification. ens = fitcensemble(X,Y,'Method','Subspace','Learners',learner, ... 'NPredToSample',5,'CrossVal','on'); figure; % Plot the accuracy versus number in ensemble plot(kfoldLoss(ens,'Mode','Cumulative')) xlabel('Number of learners in ensemble'); ylabel('10 fold classification error'); title('k-NN classification with Random Subspace');

18-105

18

Nonparametric Supervised Learning

There seems to be no advantage in an ensemble with more than 50 or so learners. It is possible that 25 learners gives good predictions. Create a final ensemble Construct a final ensemble with 50 learners. Compact the ensemble and see if the compacted version saves an appreciable amount of memory. ens = fitcensemble(X,Y,'Method','Subspace','NumLearningCycles',50,... 'Learners',learner,'NPredToSample',5); cens = compact(ens); s1 = whos('ens'); s2 = whos('cens'); [s1.bytes s2.bytes] % si.bytes = size in bytes ans = 1×2 1739556

1510252

The compact ensemble is about 10% smaller than the full ensemble. Both give the same predictions.

See Also compact | fitcensemble | fitcknn | kfoldLoss | templateKNN

18-106

Random Subspace Classification

Related Examples •

“Framework for Ensemble Learning” on page 18-30



“Ensemble Algorithms” on page 18-38



“Train Classification Ensemble” on page 18-53



“Test Ensemble Quality” on page 18-65

18-107

18

Nonparametric Supervised Learning

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger Statistics and Machine Learning Toolbox™ offers two objects that support bootstrap aggregation (bagging) of regression trees: TreeBagger created by using TreeBagger and RegressionBaggedEnsemble created by using fitrensemble. See “Comparison of TreeBagger and Bagged Ensembles” on page 18-43 for differences between TreeBagger and RegressionBaggedEnsemble. This example shows the workflow for classification using the features in TreeBagger only. Use a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric and the last 10 are categorical. The symboling index takes integer values from -3 to 3. Load the data set and split it into predictor and response arrays. load imports-85 Y = X(:,1); X = X(:,2:end); isCategorical = [zeros(15,1);ones(size(X,2)-15,1)]; % Categorical variable flag

Because bagging uses randomized data drawings, its exact outcome depends on the initial random seed. To reproduce the results in this example, use the random stream settings. rng(1945,'twister')

Finding the Optimal Leaf Size For regression, the general rule is to the set leaf size to 5 and select one third of the input features for decision splits at random. In the following step, verify the optimal leaf size by comparing mean squared errors obtained by regression for various leaf sizes. oobError computes MSE versus the number of grown trees. You must set OOBPred to 'On' to obtain out-of-bag predictions later. leaf = [5 10 20 50 100]; col = 'rbcmy'; figure for i=1:length(leaf) b = TreeBagger(50,X,Y,'Method','R','OOBPrediction','On',... 'CategoricalPredictors',find(isCategorical == 1),... 'MinLeafSize',leaf(i)); plot(oobError(b),col(i)) hold on end xlabel('Number of Grown Trees') ylabel('Mean Squared Error') legend({'5' '10' '20' '50' '100'},'Location','NorthEast') hold off

18-108

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

The red curve (leaf size 5) yields the lowest MSE values. Estimating Feature Importance In practical applications, you typically grow ensembles with hundreds of trees. For example, the previous code block uses 50 trees for faster processing. Now that you have estimated the optimal leaf size, grow a larger ensemble with 100 trees and use it to estimate feature importance. b = TreeBagger(100,X,Y,'Method','R','OOBPredictorImportance','On',... 'CategoricalPredictors',find(isCategorical == 1),... 'MinLeafSize',5);

Inspect the error curve again to make sure nothing went wrong during training. figure plot(oobError(b)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Mean Squared Error')

18-109

18

Nonparametric Supervised Learning

Prediction ability should depend more on important features than unimportant features. You can use this idea to measure feature importance. For each feature, permute the values of this feature across every observation in the data set and measure how much worse the MSE becomes after the permutation. You can repeat this for each feature. Plot the increase in MSE due to permuting out-of-bag observations across each input variable. The OOBPermutedPredictorDeltaError array stores the increase in MSE averaged over all trees in the ensemble and divided by the standard deviation taken over the trees, for each variable. The larger this value, the more important the variable. Imposing an arbitrary cutoff at 0.7, you can select the four most important features. figure bar(b.OOBPermutedPredictorDeltaError) xlabel('Feature Number') ylabel('Out-of-Bag Feature Importance')

18-110

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

idxvar = find(b.OOBPermutedPredictorDeltaError>0.7) idxvar = 1×2 16

19

idxCategorical = find(isCategorical(idxvar)==1);

The OOBIndices property of TreeBagger tracks which observations are out of bag for what trees. Using this property, you can monitor the fraction of observations in the training data that are in bag for all trees. The curve starts at approximately 2/3, which is the fraction of unique observations selected by one bootstrap replica, and goes down to 0 at approximately 10 trees. finbag = zeros(1,b.NTrees); for t=1:b.NTrees finbag(t) = sum(all(~b.OOBIndices(:,1:t),2)); end finbag = finbag / size(X,1); figure plot(finbag) xlabel('Number of Grown Trees') ylabel('Fraction of In-Bag Observations')

18-111

18

Nonparametric Supervised Learning

Growing Trees on a Reduced Set of Features Using just the four most powerful features, determine if it is possible to obtain a similar predictive power. To begin, grow 100 trees on these features only. The first two of the four selected features are numeric and the last two are categorical. b5v = TreeBagger(100,X(:,idxvar),Y,'Method','R',... 'OOBPredictorImportance','On','CategoricalPredictors',idxCategorical,... 'MinLeafSize',5); figure plot(oobError(b5v)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Mean Squared Error')

18-112

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

figure bar(b5v.OOBPermutedPredictorDeltaError) xlabel('Feature Index') ylabel('Out-of-Bag Feature Importance')

18-113

18

Nonparametric Supervised Learning

These four most powerful features give the same MSE as the full set, and the ensemble trained on the reduced set ranks these features similarly to each other. If you remove features 1 and 2 from the reduced set, then the predictive power of the algorithm might not decrease significantly. Finding Outliers To find outliers in the training data, compute the proximity matrix using fillProximities. b5v = fillProximities(b5v);

The method normalizes this measure by subtracting the mean outlier measure for the entire sample. Then it takes the magnitude of this difference and divides the result by the median absolute deviation for the entire sample. figure histogram(b5v.OutlierMeasure) xlabel('Outlier Measure') ylabel('Number of Observations')

18-114

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

Discovering Clusters in the Data By applying multidimensional scaling to the computed matrix of proximities, you can inspect the structure of the input data and look for possible clusters of observations. The mdsProx method returns scaled coordinates and eigenvalues for the computed proximity matrix. If you run it with the Colors name-value-pair argument, then this method creates a scatter plot of two scaled coordinates. figure(8) [~,e] = mdsProx(b5v,'Colors','K'); xlabel('First Scaled Coordinate') ylabel('Second Scaled Coordinate')

18-115

18

Nonparametric Supervised Learning

Assess the relative importance of the scaled axes by plotting the first 20 eigenvalues. figure bar(e(1:20)) xlabel('Scaled Coordinate Index') ylabel('Eigenvalue')

18-116

Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger

Saving the Ensemble Configuration for Future Use To use the trained ensemble for predicting the response on unseen data, store the ensemble to disk and retrieve it later. If you do not want to compute predictions for out-of-bag data or reuse training data in any other way, there is no need to store the ensemble object itself. Saving the compact version of the ensemble is enough in this case. Extract the compact object from the ensemble. c = compact(b5v) c = CompactTreeBagger Ensemble with 100 bagged decision trees: Method: regression NumPredictors: 2 Properties, Methods

You can save the resulting CompactTreeBagger model in a *.mat file.

See Also TreeBagger | compact | fillprox | fitrensemble | mdsprox | oobError

Related Examples •

“Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 18-119 18-117

18

Nonparametric Supervised Learning

18-118



“Comparison of TreeBagger and Bagged Ensembles” on page 18-43



“Use Parallel Processing for Regression TreeBagger Workflow” on page 30-6

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger Statistics and Machine Learning Toolbox™ offers two objects that support bootstrap aggregation (bagging) of classification trees: TreeBagger created by using TreeBagger and ClassificationBaggedEnsemble created by using fitcensemble. See “Comparison of TreeBagger and Bagged Ensembles” on page 18-43 for differences between TreeBagger and ClassificationBaggedEnsemble. This example shows the workflow for classification using the features in TreeBagger only. Use ionosphere data with 351 observations and 34 real-valued predictors. The response variable is categorical with two levels: • 'g' represents good radar returns. • 'b' represents bad radar returns. The goal is to predict good or bad returns using a set of 34 measurements. Fix the initial random seed, grow 50 trees, inspect how the ensemble error changes with accumulation of trees, and estimate feature importance. For classification, it is best to set the minimal leaf size to 1 and select the square root of the total number of features for each decision split at random. These settings are defaults for TreeBagger used for classification. load ionosphere rng(1945,'twister') b = TreeBagger(50,X,Y,'OOBPredictorImportance','On'); figure plot(oobError(b)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Classification Error')

18-119

18

Nonparametric Supervised Learning

The method trains ensembles with few trees on observations that are in bag for all trees. For such observations, it is impossible to compute the true out-of-bag prediction, and TreeBagger returns the most probable class for classification and the sample mean for regression. You can change the default value returned for in-bag observations using the DefaultYfit property. If you set the default value to an empty character vector for classification, the method excludes in-bag observations from computation of the out-of-bag error. In this case, the curve is more variable when the number of trees is small, either because some observations are never out of bag (and are therefore excluded) or because their predictions are based on few trees. b.DefaultYfit = ''; figure plot(oobError(b)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Error Excluding In-Bag Observations')

18-120

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

The OOBIndices property of TreeBagger tracks which observations are out of bag for what trees. Using this property, you can monitor the fraction of observations in the training data that are in bag for all trees. The curve starts at approximately 2/3, which is the fraction of unique observations selected by one bootstrap replica, and goes down to 0 at approximately 10 trees. finbag = zeros(1,b.NumTrees); for t=1:b.NTrees finbag(t) = sum(all(~b.OOBIndices(:,1:t),2)); end finbag = finbag / size(X,1); figure plot(finbag) xlabel('Number of Grown Trees') ylabel('Fraction of In-Bag Observations')

18-121

18

Nonparametric Supervised Learning

Estimate feature importance. figure bar(b.OOBPermutedPredictorDeltaError) xlabel('Feature Index') ylabel('Out-of-Bag Feature Importance')

18-122

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

Select the features yielding an importance measure greater than 0.75. This threshold is chosen arbitrarily. idxvar = find(b.OOBPermutedPredictorDeltaError>0.75) idxvar = 1×6 3

5

6

7

8

27

Having selected the most important features, grow a larger ensemble on the reduced feature set. Save time by not permuting out-of-bag observations to obtain new estimates of feature importance for the reduced feature set (set OOBVarImp to 'off'). You would still be interested in obtaining out-ofbag estimates of classification error (set OOBPred to 'on'). b5v = TreeBagger(100,X(:,idxvar),Y,'OOBPredictorImportance','off','OOBPrediction','on'); figure plot(oobError(b5v)) xlabel('Number of Grown Trees') ylabel('Out-of-Bag Classification Error')

18-123

18

Nonparametric Supervised Learning

For classification ensembles, in addition to classification error (fraction of misclassified observations), you can also monitor the average classification margin. For each observation, the margin is defined as the difference between the score for the true class and the maximal score for other classes predicted by this tree. The cumulative classification margin uses the scores averaged over all trees and the mean cumulative classification margin is the cumulative margin averaged over all observations. The oobMeanMargin method with the 'mode' argument set to 'cumulative' (default) shows how the mean cumulative margin changes as the ensemble grows: every new element in the returned array represents the cumulative margin obtained by including a new tree in the ensemble. If training is successful, you would expect to see a gradual increase in the mean classification margin. The method trains ensembles with few trees on observations that are in bag for all trees. For such observations, it is impossible to compute the true out-of-bag prediction, and TreeBagger returns the most probable class for classification and the sample mean for regression. For decision trees, a classification score is the probability of observing an instance of this class in this tree leaf. For example, if the leaf of a grown decision tree has five 'good' and three 'bad' training observations in it, the scores returned by this decision tree for any observation fallen on this leaf are 5/8 for the 'good' class and 3/8 for the 'bad' class. These probabilities are called 'scores' for consistency with other classifiers that might not have an obvious interpretation for numeric values of returned predictions. figure plot(oobMeanMargin(b5v)); xlabel('Number of Grown Trees') ylabel('Out-of-Bag Mean Classification Margin')

18-124

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

Compute the matrix of proximities and examine the distribution of outlier measures. Unlike regression, outlier measures for classification ensembles are computed within each class separately. b5v = fillProximities(b5v); figure histogram(b5v.OutlierMeasure) xlabel('Outlier Measure') ylabel('Number of Observations')

18-125

18

Nonparametric Supervised Learning

Find the class of the extreme outliers. extremeOutliers = b5v.Y(b5v.OutlierMeasure>40) extremeOutliers = 3x1 cell {'g'} {'g'} {'g'} percentGood = 100*sum(strcmp(extremeOutliers,'g'))/numel(extremeOutliers) percentGood = 100

All of the extreme outliers are labeled 'good'. As for regression, you can plot scaled coordinates, displaying the two classes in different colors using the 'Colors' name-value pair argument of mdsProx. This argument takes a character vector in which every character represents a color. The software does not rank class names. Therefore, it is best practice to determine the position of the classes in the ClassNames property of the ensemble. gPosition = find(strcmp('g',b5v.ClassNames)) gPosition = 2

The 'bad' class is first and the 'good' class is second. Display scaled coordinates using red for the 'bad' class and blue for the 'good' class observations. 18-126

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

figure [s,e] = mdsProx(b5v,'Colors','rb'); xlabel('First Scaled Coordinate') ylabel('Second Scaled Coordinate')

Plot the first 20 eigenvalues obtained by scaling. The first eigenvalue clearly dominates and the first scaled coordinate is most important. figure bar(e(1:20)) xlabel('Scaled Coordinate Index') ylabel('Eigenvalue')

18-127

18

Nonparametric Supervised Learning

Another way of exploring the performance of a classification ensemble is to plot its Receiver Operating Characteristic (ROC) curve or another performance curve suitable for the current problem. Obtain predictions for out-of-bag observations. For a classification ensemble, the oobPredict method returns a cell array of classification labels as the first output argument and a numeric array of scores as the second output argument. The returned array of scores has two columns, one for each class. In this case, the first column is for the 'bad' class and the second column is for the 'good' class. One column in the score matrix is redundant because the scores represent class probabilities in tree leaves and by definition add up to 1. [Yfit,Sfit] = oobPredict(b5v);

Use perfcurve to compute a performance curve. By default, perfcurve returns the standard ROC curve, which is the true positive rate versus the false positive rate. perfcurve requires true class labels, scores, and the positive class label for input. In this case, choose the 'good' class as positive. [fpr,tpr] = perfcurve(b5v.Y,Sfit(:,gPosition),'g'); figure plot(fpr,tpr) xlabel('False Positive Rate') ylabel('True Positive Rate')

18-128

Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger

Instead of the standard ROC curve, you might want to plot, for example, ensemble accuracy versus threshold on the score for the 'good' class. The ycrit input argument of perfcurve lets you specify the criterion for the y-axis, and the third output argument of perfcurve returns an array of thresholds for the positive class score. Accuracy is the fraction of correctly classified observations, or equivalently, 1 minus the classification error. [fpr,accu,thre] = perfcurve(b5v.Y,Sfit(:,gPosition),'g','YCrit','Accu'); figure(20) plot(thre,accu) xlabel('Threshold for ''good'' Returns') ylabel('Classification Accuracy')

18-129

18

Nonparametric Supervised Learning

The curve shows a flat region indicating that any threshold from 0.2 to 0.6 is a reasonable choice. By default, the perfcurve assigns classification labels using 0.5 as the boundary between the two classes. You can find exactly what accuracy this corresponds to. accu(abs(thre-0.5) Advanced. The app opens a dialog box in which you can select Optimize check boxes for the hyperparameters that you want to optimize. Under Values, specify the fixed values for the hyperparameters that you do not want to optimize or that are not optimizable. 23-48

Hyperparameter Optimization in Classification Learner App

This table describes the hyperparameters that you can optimize for each type of model and the search range of each hyperparameter. It also includes the additional hyperparameters for which you can specify fixed values. Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes

Optimizable Tree

• Maximum number • Surrogate decision For more information, of splits – The splits see “Advanced Tree software searches Options” on page 23-25. • Maximum among integers logsurrogates per scaled in the range node [1,max(2,n–1)], where n is the number of observations. • Split criterion – The software searches among Gini's diversity index, Twoing rule, and Maximum deviance reduction.

Optimizable Discriminant

• Discriminant type – The software searches among Linear, Quadratic, Diagonal Linear, and Diagonal Quadratic.

• The Discriminant type optimizable hyperparameter combines the preset model types (Linear Discriminant and Quadratic Discriminant) with the Covariance structure advanced option of the preset models. For more information, see “Advanced Discriminant Options” on page 23-27.

23-49

23

Classification Learner

Model

Optimizable Hyperparameters

Optimizable Naive Bayes

• Distribution names • Support – The software searches between Gaussian and Kernel. • Kernel type – The software searches among Gaussian, Box, Epanechnikov, and Triangle.

Additional Hyperparameters

Notes • The Gaussian value of the Distribution names optimizable hyperparameter specifies a Gaussian Naive Bayes model. Similarly, the Kernel Distribution names value specifies a Kernel Naive Bayes model. For more information, see “Advanced Naive Bayes Options” on page 23-36.

Optimizable SVM

• Kernel function – The software searches among Gaussian, Linear, Quadratic, and Cubic. • Box constraint level – The software searches among positive values logscaled in the range [0.001,1000]. • Kernel scale – The software searches among positive values log-scaled in the range [0.001,1000]. • Multiclass method – The software searches between One-vs-One and One-vs-All. • Standardize data – The software searches between true and false.

23-50

• The Kernel scale optimizable hyperparameter combines the Kernel scale mode and Manual kernel scale advanced options of the preset SVM models. • You can optimize the Kernel scale optimizable hyperparameter only when the Kernel function value is Gaussian. Unless you specify a value for Kernel scale by clearing the Optimize check box, the app uses the Manual value of 1 by default when the Kernel function has a value other than Gaussian. For more information, see “Advanced SVM Options” on page 23-29.

Hyperparameter Optimization in Classification Learner App

Model

Optimizable Hyperparameters

Optimizable KNN

• Number of neighbors – The software searches among integers logscaled in the range [1,max(2,round(n /2))], where n is the number of observations.

Additional Hyperparameters

Notes For more information, see “Advanced KNN Options” on page 23-32.

• Distance metric – The software searches among: • Euclidean • City block • Chebyshev • Minkowski (cubic) • Mahalanobis • Cosine • Correlation • Spearman • Hamming • Jaccard • Distance weight – The software searches among Equal, Inverse, and Squared inverse. • Standardize – The software searches between true and false.

23-51

23

Classification Learner

Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes

Optimizable Ensemble

• Ensemble method – The software searches among AdaBoost, RUSBoost, LogitBoost, GentleBoost, and Bag.

• Learner type

• The AdaBoost, LogitBoost, and GentleBoost values of the Ensemble method optimizable hyperparameter specify a Boosted Trees model. Similarly, the RUSBoost Ensemble method value specifies an RUSBoosted Trees model, and the Bag Ensemble method value specifies a Bagged Trees model.

• Maximum number of splits – The software searches among integers logscaled in the range [1,max(2,n–1)], where n is the number of observations. • Number of learners – The software searches among integers logscaled in the range [10,500]. • Learning rate – The software searches among real values log-scaled in the range [0.001,1]. • Number of predictors to sample – The software searches among integers in the range [1,max(2,p)], where p is the number of predictor variables.

23-52

• The LogitBoost and GentleBoost values are available only for binary classification. • The Number of predictors to sample optimizable hyperparameter is not available in the advanced options of the preset ensemble models. • You can optimize the Number of predictors to sample optimizable hyperparameter only when the Ensemble method value is Bag. Unless you specify a value for Number of predictors to sample by clearing the Optimize check box, the app uses the default value of Select All when the

Hyperparameter Optimization in Classification Learner App

Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes Ensemble method has a value other than Bag. For more information, see “Advanced Ensemble Options” on page 23-34.

Optimization Options By default, the Classification Learner app performs hyperparameter tuning by using Bayesian optimization. The goal of Bayesian optimization, and optimization in general, is to find a point that minimizes an objective function. In the context of hyperparameter tuning in the app, a point is a set of hyperparameter values, and the objective function is the loss function, or the classification error. For more information on the basics of Bayesian optimization, see “Bayesian Optimization Workflow” on page 10-25. You can specify how the hyperparameter tuning is performed. For example, you can change the optimization method to grid search or limit the training time. On the Classification Learner tab, in the Model Type section, select Advanced > Optimizer Options. The app opens a dialog box in which you can select optimization options. This table describes the available optimization options and their default values. Option

Description

Optimizer

The optimizer values are: • Bayesopt (default) – Use Bayesian optimization. Internally, the app calls the bayesopt function. • Grid search – Use grid search with the number of values per dimension determined by the Number of grid divisions value. The app searches in a random order, using uniform sampling without replacement from the grid. • Random search – Search at random among points, where the number of points corresponds to the Iterations value.

23-53

23

Classification Learner

Option

Description

Acquisition function

When the app performs Bayesian optimization for hyperparameter tuning, it uses the acquisition function to determine the next set of hyperparameter values to try. The acquisition function values are: • Expected improvement per second plus (default) • Expected improvement • Expected improvement plus • Expected improvement per second • Lower confidence bound • Probability of improvement For details on how these acquisition functions work in the context of Bayesian optimization, see “Acquisition Function Types” on page 10-3.

Iterations

Each iteration corresponds to a combination of hyperparameter values that the app tries. When you use Bayesian optimization or random search, specify a positive integer that sets the number of iterations. The default value is 30. When you use grid search, the app ignores the Iterations value and evaluates the loss at every point in the entire grid. You can set a training time limit to stop the optimization process prematurely.

23-54

Training time limit

To set a training time limit, select this option and set the Maximum training time in seconds option. By default, the app does not have a training time limit.

Maximum training time in seconds

Set the training time limit in seconds as a positive real number. The default value is 300. The run time can exceed the training time limit because this limit does not interrupt an iteration evaluation.

Number of grid divisions

When you use grid search, set a positive integer as the number of values the app tries for each numeric hyperparameter. The app ignores this value for categorical hyperparameters. The default value is 10.

Hyperparameter Optimization in Classification Learner App

Minimum Classification Error Plot After specifying which model hyperparameters to optimize and setting any additional optimization options (optional), train your optimizable model. On the Classification Learner tab, in the Training section, click Train. The app creates a Minimum Classification Error Plot that it updates as the optimization runs.

Note When you train an optimizable model, the app disables the Use Parallel button. After the training is complete, the app makes the button available again when you select a nonoptimizable model. The button is off by default.

23-55

23

Classification Learner

The minimum classification error plot displays the following information: • Estimated minimum classification error – Each light blue point corresponds to an estimate of the minimum classification error computed by the optimization process when considering all the sets of hyperparameter values tried so far, including the current iteration. The estimate is based on an upper confidence interval of the current classification error objective model, as mentioned in the Bestpoint hyperparameters description. If you use grid search or random search to perform hyperparameter optimization, the app does not display these light blue points. • Observed minimum classification error – Each dark blue point corresponds to the observed minimum classification error computed so far by the optimization process. For example, at the third iteration, the dark blue point corresponds to the minimum of the classification error observed in the first, second, and third iterations. • Bestpoint hyperparameters – The red square indicates the iteration that corresponds to the optimized hyperparameters. You can find the values of the optimized hyperparameters listed in the upper right of the plot under Optimization Results. The optimized hyperparameters do not always provide the observed minimum classification error. When the app performs hyperparameter tuning by using Bayesian optimization (see “Optimization Options” on page 23-53 for a brief introduction), it chooses the set of hyperparameter values that minimizes an upper confidence interval of the classification error objective model, rather than the set that minimizes the classification error. For more information, see the 'Criterion','minvisited-upper-confidence-interval' name-value pair argument of bestPoint. • Minimum error hyperparameters – The yellow point indicates the iteration that corresponds to the hyperparameters that yield the observed minimum classification error. For more information, see the 'Criterion','min-observed' name-value pair argument of bestPoint. If you use grid search to perform hyperparameter optimization, the Bestpoint hyperparameters and the Minimum error hyperparameters are the same. Missing points in the plot correspond to NaN minimum classification error values.

Optimization Results When the app finishes tuning model hyperparameters, it returns a model trained with the optimized hyperparameter values (Bestpoint hyperparameters). The model metrics, displayed plots, and exported model correspond to this trained model with fixed hyperparameter values. To inspect the optimization results of a trained optimizable model, select the model in the History list and look at the Current Model pane.

23-56

Hyperparameter Optimization in Classification Learner App

The Current Model pane includes these sections: • Results – Shows the performance of the optimizable model • Model Type – Displays the type of optimizable model and lists any fixed hyperparameter values • Optimized Hyperparameters – Lists the values of the optimized hyperparameters • Hyperparameter Search Range – Displays the search ranges for the optimized hyperparameters • Optimizer Options – Shows the selected optimizer options When you perform hyperparameter tuning using Bayesian optimization and you export the resulting trained optimizable model to the workspace as a structure, the structure includes a BayesianOptimization object in the HyperParameterOptimizationResult field. The object contains the results of the optimization performed in the app. 23-57

23

Classification Learner

When you generate MATLAB code from a trained optimizable model, the generated code uses the fixed and optimized hyperparameter values of the model to train on new data. The generated code does not include the optimization process. For information on how to perform Bayesian optimization when you use a fit function, see “Bayesian Optimization Using a Fit Function” on page 10-26.

See Also Related Examples

23-58



“Train Classifier Using Hyperparameter Optimization in Classification Learner App” on page 23116



“Bayesian Optimization Workflow” on page 10-25



“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69

Assess Classifier Performance in Classification Learner

Assess Classifier Performance in Classification Learner In this section... “Check Performance in the History List” on page 23-59 “Plot Classifier Results” on page 23-60 “Check Performance Per Class in the Confusion Matrix” on page 23-61 “Check the ROC Curve” on page 23-62 After training classifiers in Classification Learner, you can compare models based on accuracy scores, visualize results by plotting class predictions, and check performance using confusion matrix and ROC curve. • If you use k-fold cross-validation, then the app computes the accuracy scores using the observations in the k validation folds and reports the average cross-validation error. It also makes predictions on the observations in these validation folds and computes the confusion matrix and ROC curve based on these predictions. Note When you import data into the app, if you accept the defaults, the app automatically uses cross-validation. To learn more, see “Choose Validation Scheme” on page 23-18. • If you use hold-out validation, the app computes the accuracy scores using the observations in the validation fold and makes predictions on these observations. The app also computes the confusion matrix and ROC curve based on these predictions. • If you choose not to use a validation scheme, the score is the resubstitution accuracy based on all the training data, and the predictions are resubstitution predictions.

Check Performance in the History List After training a model in Classification Learner, check the History list to see which model has the best overall accuracy in percent. The best Accuracy score is highlighted in a box. This score is the validation accuracy (unless you opted for no validation scheme). The validation accuracy score estimates a model's performance on new data compared to the training data. Use the score to help you choose the best model. • For cross-validation, the score is the accuracy on all observations, counting each observation when it was in a held-out fold. • For holdout validation, the score is the accuracy on the held-out observations. • For no validation, the score is the resubstitution accuracy against all the training data observations. The best overall score might not be the best model for your goal. A model with a slightly lower overall accuracy might be the best classifier for your goal. For example, false positives in a particular class might be important to you. You might want to exclude some predictors where data collection is expensive or difficult. To find out how the classifier performed in each class, examine the confusion matrix.

23-59

23

Classification Learner

Plot Classifier Results In the scatter plot, view the classifier results. After you train a classifier, the scatter plot switches from displaying the data to showing model predictions. If you are using holdout or cross-validation, then these predictions are the predictions on the held-out observations. In other words, each prediction is obtained using a model that was trained without using the corresponding observation. To investigate your results, use the controls on the right. You can: • Choose whether to plot model predictions or the data alone. • Show or hide correct or incorrect results using the check boxes under Model predictions. • Choose features to plot using the X and Y lists under Predictors. • Visualize results by class by showing or hiding specific classes using the check boxes under Show. • Change the stacking order of the plotted classes by selecting a class under Classes and then clicking Move to Front. • Zoom in and out, or pan across the plot. To enable zooming or panning, hover the mouse over the scatter plot and click the corresponding button on the toolbar that appears above the top right of the plot.

See also “Investigate Features in the Scatter Plot” on page 23-37. To export the scatter plots you create in the app to figures, see “Export Plots in Classification Learner App” on page 23-64. 23-60

Assess Classifier Performance in Classification Learner

Check Performance Per Class in the Confusion Matrix Use the confusion matrix plot to understand how the currently selected classifier performed in each class. To view the confusion matrix after training a model, click Confusion Matrix in the Plots section of the Classification Learner tab. The confusion matrix helps you identify the areas where the classifier has performed poorly. When you open the plot, the rows show the true class, and the columns show the predicted class. If you are using holdout or cross-validation, then the confusion matrix is calculated using the predictions on the held-out observations. The diagonal cells show where the true class and predicted class match. If these diagonal cells are blue, the classifier has classified observations of this true class are classified correctly. The default view shows the number of observations in each cell. To see how the classifier performed per class, under Plot, select the True Positive Rates (TPR), False Negative Rates (FNR) option. The TPR is the proportion of correctly classified observations per true class. The FNR is the proportion of incorrectly classified observations per true class. The plot shows summaries per true class in the last two columns on the right. Tip Look for areas where the classifier performed poorly by examining cells off the diagonal that display high percentages and are orange. The higher the percentage, the darker the hue of the cell color. In these orange cells, the true class and the predicted class do not match. The data points are misclassified.

23-61

23

Classification Learner

In this example, which uses the carsmall data set, the top row shows all cars with the true class France. The columns show the predicted classes. In the top row, 25% of the cars from France are correctly classified, so 25% is the true positive rate for correctly classified points in this class, shown in the blue cell in the TPR column. The other cars in the France row are misclassified: 25% of the cars are incorrectly classified as from Japan, and 50% are classified as from USA. The false negative rate for incorrectly classified points in this class is 75%, shown in the orange cell in the FNR column. If you want to see numbers of observations (cars, in this example) instead of percentages, under Plot, select Number of observations. If false positives are important in your classification problem, plot results per predicted class (instead of true class) to investigate false discovery rates. To see results per predicted class, under Plot, select the Positive Predictive Values (PPV), False Discovery Rates (FDR) option. The PPV is the proportion of correctly classified observations per predicted class. The FDR is the proportion of incorrectly classified observations per predicted class. With this option selected, the confusion matrix now includes summary rows below the table. Positive predictive values are shown in blue for the correctly predicted points in each class, and false discovery rates are shown in orange for the incorrectly predicted points in each class. If you decide there are too many misclassified points in the classes of interest, try changing classifier settings or feature selection to search for a better model. To export the confusion matrix plots you create in the app to figures, see “Export Plots in Classification Learner App” on page 23-64.

Check the ROC Curve To view the ROC curve after training a model, on the Classification Learner tab, in the Plots section, click ROC Curve. View the receiver operating characteristic (ROC) curve showing true and false positive rates. The ROC curve shows true positive rate versus false positive rate for the currently selected trained classifier. You can select different classes to plot. The marker on the plot shows the performance of the currently selected classifier. The marker shows the values of the false positive rate (FPR) and the true positive rate (TPR) for the currently selected classifier. For example, a false positive rate (FPR) of 0.2 indicates that the current classifier assigns 20% of the observations incorrectly to the positive class. A true positive rate of 0.9 indicates that the current classifier assigns 90% of the observations correctly to the positive class. A perfect result with no misclassified points is a right angle to the top left of the plot. A poor result that is no better than random is a line at 45 degrees. The Area Under Curve number is a measure of the overall quality of the classifier. Larger Area Under Curve values indicate better classifier performance. Compare classes and trained models to see if they perform differently in the ROC curve. For more information, see perfcurve. To export the ROC curve plots you create in the app to figures, see “Export Plots in Classification Learner App” on page 23-64.

23-62

Assess Classifier Performance in Classification Learner

See Also Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-37



“Export Plots in Classification Learner App” on page 23-64



“Export Classification Model to Predict New Data” on page 23-69



“Train Decision Trees Using Classification Learner App” on page 23-74

23-63

23

Classification Learner

Export Plots in Classification Learner App After you create plots interactively in the Classification Learner app, you can export your app plots to MATLAB figures. You can then copy, save, or customize the new figures. Choose among the available plots: scatter plot on page 23-37, parallel coordinates plot on page 23-40, confusion matrix on page 23-61, ROC curve on page 23-62, and minimum classification error plot on page 23-55. • Before exporting a plot, make sure the plot in the app displays the same data that you want in the new figure. • On the Classification Learner tab, in the Export section, click Export Plot to Figure. The app creates a figure from the selected plot. • The new figure might not have the same interactivity options as the plot in the Classification Learner app. For example, data tips for the exported scatter plot show only X,Y values for the selected point, not the detailed information displayed in the app. • Additionally, the figure might have a different axes toolbar than the one in the app plot. For plots in Classification Learner, an axes toolbar appears above the top right of the plot. The buttons available on the toolbar depend on the contents of the plot. The toolbar can include buttons to export the plot as an image, add data tips, pan or zoom the data, and restore the view.

• Copy, save, or customize the new figure, which is displayed in the figure window. • To copy the figure, select Edit > Copy Figure. For more information, see “Copy Figure to Clipboard from Edit Menu” (MATLAB). • To save the figure, select File > Save As. Alternatively, you can follow the workflow described in “Customize Figure Before Saving” (MATLAB). •

on the figure toolbar. Right-click the To customize the figure, click the Edit Plot button section of the plot that you want to edit. You can change the listed properties, which might include Color, Font, Line Style, and other properties. Or, you can use the Property Inspector to change the figure properties.

As an example, export a scatter plot in the app to a figure, customize the figure, and save the modified figure. 1

In the MATLAB Command Window, read the sample file fisheriris.csv into a table. fishertable = readtable('fisheriris.csv');

2

Click the Apps tab.

3

In the Apps section, click the arrow to open the gallery. Under Machine Learning and Deep Learning, click Classification Learner.

4

On the Classification Learner tab, in the File section, click New Session

23-64

.

5

In the New Session dialog box, select the table fishertable from the Workspace Variable list.

6

Click Start Session. Classification Learner creates a scatter plot of the data by default.

7

Change the predictors in the scatter plot to PetalLength and PetalWidth.

Export Plots in Classification Learner App

8 9

On the Classification Learner tab, in the Export section, click Export Plot to Figure. on the figure toolbar. Right-click the points in the In the new figure, click the Edit Plot button plot corresponding to the versicolor irises. In the context menu, select Color.

23-65

23

Classification Learner

10 In the Color dialog box, select a new color and click OK.

23-66

Export Plots in Classification Learner App

11 To save the figure, select File > Save As. Specify the saved file location, name, and type.

23-67

23

Classification Learner

See Also Related Examples

23-68



“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-37



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69

Export Classification Model to Predict New Data

Export Classification Model to Predict New Data In this section... “Export the Model to the Workspace to Make Predictions for New Data” on page 23-69 “Make Predictions for New Data” on page 23-69 “Generate MATLAB Code to Train the Model with New Data” on page 23-70 “Generate C Code for Prediction” on page 23-71 “Deploy Predictions Using MATLAB Compiler” on page 23-73

Export the Model to the Workspace to Make Predictions for New Data After you create classification models interactively in Classification Learner, you can export your best model to the workspace. You can then use the trained model to make predictions using new data. Note The final model Classification Learner exports is always trained using the full data set. The validation scheme that you use only affects the way that the app computes validation metrics. You can use the validation metrics and various plots that visualize results to pick the best model for your classification problem. Here are the steps for exporting a model to the MATLABworkspace: 1

In Classification Learner, select the model you want to export in the History list.

2

On the Classification Learner tab, in the Export section, click one of the export options: • If you want to include the data used for training the model, then select Export Model. You export the trained model to the workspace as a structure containing a classification object, such as a ClassificationTree, ClassificationDiscriminant, ClassificationSVM, ClassificationNaiveBayes, ClassificationKNN, ClassificationEnsemble, and so on. • If you do not want to include the training data, select Export Compact Model. This option exports the model with unnecessary data removed where possible. For some classifiers this is a compact classification object that does not include the training data (for example, CompactClassificationTree). You can use a compact classification object for making predictions of new data, but you can use fewer other methods with it.

3

In the Export Model dialog box, edit the name for your exported variable if you want, and then click OK. The default name for your exported model, trainedModel, increments every time you export to avoid overwriting your classifiers (for example, trainedModel1). The new variable (for example, trainedModel) appears in your workspace. The app displays information about the exported model in the command window. Read the message to learn how to make predictions with new data.

Make Predictions for New Data After you export a model to the workspace from Classification Learner, or run the code generated from the app, you get a trainedModel structure that you can use to make predictions using new 23-69

23

Classification Learner

data. The structure contains a classification object and a function for prediction. The structure allows you to make predictions for models that include principal component analysis (PCA). 1

To use the exported classifier to make predictions for new data, T, use the form: yfit = C.predictFcn(T)

where C is the name of your variable (for example, trainedModel). Supply the data T with the same format and data type as the training data used in the app (table or matrix). • If you supply a table, ensure it contains the same predictor names as your training data. The predictFcn function ignores additional variables in tables. Variable formats and types must match the original training data. • If you supply a matrix, it must contain the same predictor columns or rows as your training data, in the same order and format. Do not include a response variable, any variables that you did not import in the app, or other unused variables. The output yfit contains a class prediction for each data point. 2

Examine the fields of the exported structure. For help making predictions, enter: C.HowToPredict

You can also extract the classification object from the exported structure for further analysis (for example, trainedModel.ClassificationSVM, trainedModel.ClassificationTree, and so on, depending on your model type). Be aware that if you used feature transformation such as PCA in the app, you will need to take account of this transformation by using the information in the PCA fields of the structure.

Generate MATLAB Code to Train the Model with New Data After you create classification models interactively in Classification Learner, you can generate MATLAB code for your best model. You can then use the code to train the model with new data. Generate MATLAB code to: • Train on huge data sets. Explore models in the app trained on a subset of your data, then generate code to train a selected model on a larger data set • Create scripts for training models without needing to learn syntax of the different functions • Examine the code to learn how to train classifiers programmatically • Modify the code for further analysis, for example to set options that you cannot change in the app • Repeat your analysis on different data and automate training 1

In Classification Learner, in the History list, select the model you want to generate code for.

2

On the Classification Learner tab, in the Export section, click Generate Function. The app generates code from your session and displays the file in the MATLAB Editor. The file includes the predictors and response, the classifier training methods, and validation methods. Save the file.

23-70

Export Classification Model to Predict New Data

3

To retrain your classifier model, call the function from the command line with your original data or new data as the input argument or arguments. New data must have the same shape as the original data. Copy the first line of the generated code, excluding the word function, and edit the trainingData input argument to reflect the variable name of your training data or new data. Similarly, edit the responseData input argument (if applicable). For example, to retrain a classifier trained with the fishertable data set, enter: [trainedModel, validationAccuracy] = trainClassifier(fishertable)

The generated code returns a trainedModel structure that contains the same fields as the structure you create when you export a classifier from Classification Learner to the workspace. 4

If you want to automate training the same classifier with new data, or learn how to programmatically train classifiers, examine the generated code. The code shows you how to: • Process the data into the right shape • Train a classifier and specify all the classifier options • Perform cross-validation • Compute validation accuracy • Compute validation predictions and scores Note If you generate MATLAB code from a trained optimizable model, the generated code does not include the optimization process.

Generate C Code for Prediction If you train an SVM model using Classification Learner, you can generate C code for prediction. C code generation requires: • MATLAB Coder license • SVM model (binary or multiclass) • No categorical predictors or response in your data set 1

After you train an SVM model in Classification Learner, export the model to the workspace. Find the name of the classification model object in the exported structure. Examine the fields of the structure to find the model name, for example, C.ClassificationSVM, where C is the name of your structure (for example, trainedModel). Model name depends on what type of SVM you trained (binary or multiclass) and whether you exported a compact model or not. Models can be ClassificationSVM, CompactClassificationSVM, ClassificationECOC, or CompactClassificationECOC.

2

Use the function saveLearnerForCoder to prepare the model for code generation: saveLearnerForCoder(Mdl,filename). For example: saveLearnerForCoder(C.ClassificationSVM, 'mySVM')

3

Create a function that loads the saved model and makes predictions on new data. For example: 23-71

23

Classification Learner

function label = classifyX (X) %#codegen %CLASSIFYX Classify using SVM Model % CLASSIFYX classifies the measurements in X % using the SVM model in the file mySVM.mat, and then % returns class labels in label. CompactMdl = loadLearnerForCoder('mySVM'); label = predict(CompactMdl,X); end 4

Generate a MEX function from your function. For example: codegen classifyX.m -args {data}

The %#codegen compilation directive indicates that the MATLAB code is intended for code generation. To ensure that the MEX function can use the same input, specify the data in the workspace as arguments to the function using the -args option. data must be a matrix containing only the predictor columns used to train the model. 5

Use the MEX function to make predictions. For example: labels = classifyX_mex(data);

If you used feature selection or PCA feature transformation in the app, then you need additional steps. If you used manual feature selection, supply the same columns in X. X is the input to your function. If you used PCA in the app, use the information in the PCA fields of the exported structure to take account of this transformation. It does not matter whether you imported a table or a matrix into the app, as long as X contains the matrix columns in the same order. Before generating code, follow these steps: 1

Save the PCACenters and PCACoefficients fields of the trained classifier structure, C, to file using the following command: save('pcaInfo.mat', '-struct', 'C', 'PCACenters', 'PCACoefficients');

2

In your function file, include additional lines to perform the PCA transformation. Create a function that loads the saved model, performs PCA, and makes predictions on new data. For example:

function label = classifyX (X) %#codegen %CLASSIFYX Classify using SVM Model % CLASSIFYX classifies the measurements in X % using the SVM model in the file mySVM.mat, and then % returns class labels in label. % If you used manual feature selection in the app, ensure that X contains only the columns yo CompactMdl = loadLearnerForCoder('mySVM'); pcaInfo = load('pcaInfo.mat', 'PCACenters', 'PCACoefficients'); % performs pca transformation pcaTransformedX = bsxfun(@minus, X, pcaInfo.PCACenters) * pcaInfo.PCACoefficients; [label, scores] = predict(CompactMdl, pcaTransformedX); end

For more information on C code generation workflow and limitations, see “Code Generation”. For examples, see saveLearnerForCoder and loadLearnerForCoder.

23-72

Export Classification Model to Predict New Data

Deploy Predictions Using MATLAB Compiler After you export a model to the workspace from Classification Learner, you can deploy it using MATLAB Compiler. Suppose you export the trained model to MATLAB Workspace based on the instructions in “Export Model to Workspace” on page 24-50, with the name trainedModel. To deploy predictions, follow these steps. • Save the trainedModel structure in a .mat file. save mymodel trainedModel

• Write the code to be compiled. This code must load the trained model and use it to make a prediction. It must also have a pragma, so the compiler recognizes that Statistics and Machine Learning Toolbox code is needed in the compiled application. This pragma could be any function in the toolbox. function ypred = mypredict(tbl) %#function fitctree load('mymodel.mat'); load('fishertable.mat') ypred = trainedModel.predictFcn(fishertable) end

• Compile as a standalone application. mcc -m mypredict.m

See Also Functions fitcdiscr | fitcecoc | fitcensemble | fitcknn | fitcsvm | fitctree | fitglm Classes ClassificationBaggedEnsemble | ClassificationDiscriminant | ClassificationECOC | ClassificationEnsemble | ClassificationKNN | ClassificationNaiveBayes | ClassificationSVM | ClassificationTree | CompactClassificationDiscriminant | CompactClassificationECOC | CompactClassificationEnsemble | CompactClassificationNaiveBayes | CompactClassificationSVM | CompactClassificationTree | GeneralizedLinearModel

Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10

23-73

23

Classification Learner

Train Decision Trees Using Classification Learner App This example shows how to create and compare various classification trees using Classification Learner, and export trained models to the workspace to make predictions for new data. You can train classification trees to predict responses to data. To predict a response, follow the decisions in the tree from the root (beginning) node down to a leaf node. The leaf node contains the response. Statistics and Machine Learning Toolbox trees are binary. Each step in a prediction involves checking the value of one predictor (variable). For example, here is a simple classification tree:

This tree predicts classifications based on two predictors, x1 and x2. To predict, start at the top node. At each decision, check the values of the predictors to decide which branch to follow. When the branches reach a leaf node, the data is classified either as type 0 or 1. 1

In MATLAB, load the fisheriris data set and create a table of measurement predictors (or features) using variables from the data set to use for a classification. fishertable = readtable('fisheriris.csv');

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Classification Learner tab, in the File section, click New Session > From Workspace.

4

In the New Session dialog box, select the table fishertable from the Data Set Variable list (if necessary). Observe that the app has selected response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections.

23-74

Train Decision Trees Using Classification Learner App

5

To accept the default validation scheme and continue, click Start Session. The default validation option is cross-validation, to protect against overfitting. Classification Learner creates a scatter plot of the data.

23-75

23

Classification Learner

6

Use the scatter plot to investigate which variables are useful for predicting the response. Select different options on the X and Y lists under Predictors to visualize the distribution of species and measurements. Observe which variables separate the species colors most clearly. Observe that the setosa species (blue points) is easy to separate from the other two species with all four predictors. The versicolor and virginica species are much closer together in all predictor measurements, and overlap especially when you plot sepal length and width. setosa is easier to predict than the other two species.

7

To create a classification tree model, on the Classification Learner tab, in the Model Type section, click the down arrow to expand the gallery and click Coarse Tree. Then click Train. Tip If you have Parallel Computing Toolbox then the first time you click Train you see a dialog while the app opens a parallel pool of workers. After the pool opens, you can train multiple classifiers at once and continue working. The app creates a simple classification tree, and plots the results. Observe the Coarse Tree model in the History list. Check the model validation score in the Accuracy box. The model has performed well.

23-76

Train Decision Trees Using Classification Learner App

Note With validation, there is some randomness in the results, so your model validation score results can vary from those shown. 8

Examine the scatter plot. An X indicates misclassified points. The blue points (setosa species) are all correctly classified, but some of the other two species are misclassified. Under Plot, switch between the Data and Model Predictions options. Observe the color of the incorrect (X) points. Alternatively, while plotting model predictions, to view only the incorrect points, clear the Correct check box.

9

Train a different model to compare. Click Medium Tree, and then click Train. When you click Train, the app displays a new model in the History list.

10 Observe the Medium Tree model in the History list. The model validation score is no better than

the coarse tree score. The app outlines in a box the Accuracy score of the best model. Click each model in the History list to view and compare the results. 11 Examine the scatter plot for the Medium Tree model. The medium tree classifies as many points

correctly as the previous coarse tree. You want to avoid overfitting, and the coarse tree performs well, so base all further models on the coarse tree. 12 Select Coarse Tree in the History list. To try to improve the model, try including different

features in the model. See if you can improve the model by removing features with low predictive power. On the Classification Learner tab, in the Features section, click Feature Selection. In the Feature Selection dialog box, clear the check boxes for PetalLength and PetalWidth to exclude them from the predictors. A new draft model appears in the model history list with your new settings 2/4 features, based on the coarse tree. Click Train to train a new tree model using the new predictor options. 23-77

23

Classification Learner

13 Observe the third model in the History list. It is also a Coarse Tree model, trained using only 2

of 4 predictors. The History list displays how many predictors are excluded. To check which predictors are included, click a model in the History list and observe the check boxes in the Feature Selection dialog box. The model with only sepal measurements has a much lower accuracy score than the petals-only model. 14 Train another model including only the petal measurements. Change the selections in the

Feature Selection dialog box and click Train. The model trained using only petal measurements performs comparably to the models containing all predictors. The models predict no better using all the measurements compared to only the petal measurements. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors. 15 Repeat to train two more models including only the width measurements and then the length

measurements. There is not much difference in score between several of the models. 16 Choose a best model among those of similar scores by examining the performance in each class.

Select the coarse tree that includes all the predictors. To inspect the accuracy of the predictions in each class, on the Classification Learner tab, in the Plots section, click Confusion Matrix. Use this plot to understand how the currently selected classifier performed in each class. View the matrix of true class and predicted class results. Look for areas where the classifier performed poorly by examining cells off the diagonal that display high numbers and are red. In these red cells, the true class and the predicted class do not match. The data points are misclassified.

23-78

Train Decision Trees Using Classification Learner App

Note With validation, there is some randomness in the results, so your confusion matrix results can vary from those shown. In this figure, examine the third cell in the middle row. In this cell, true class is versicolor, but the model misclassified the points as virginica. For this model, the cell shows 4 misclassified (your results can vary). To view percentages instead of numbers of observations, select the True Positive Rates option under Plot controls.. You can use this information to help you choose the best model for your goal. If false positives in this class are very important to your classification problem, then choose the best model at predicting this class. If false positives in this class are not very important, and models with fewer predictors do better in other classes, then choose a model to tradeoff some overall accuracy to exclude some predictors and make future data collection easier. 17 Compare the confusion matrix for each model in the History list. Check the Feature Selection

dialog box to see which predictors are included in each model. 18 To investigate features to include or exclude, use the scatter plot and the parallel coordinates

plot. On the Classification Learner tab, in the Plots section, click Parallel Coordinates Plot. You can see that petal length and petal width are the features that separate the classes best.

23-79

23

Classification Learner

19 To learn about model settings, choose a model in the History list and view the advanced settings.

The nonoptimizable model options in the Model Type gallery are preset starting points, and you can change further settings. On the Classification Learner tab, in the Model Type section, click Advanced. Compare the simple and medium tree models in the history, and observe the differences in the Advanced Tree Options dialog box. The Maximum Number of Splits setting controls tree depth. To try to improve the coarse tree model further, try changing the Maximum Number of Splits setting, then train a new model by clicking Train. View the settings for the selected trained model in the Current model pane, or in the Advanced dialog box. 20 To export the best trained model to the workspace, on the Classification Learner tab, in the

Export section, click Export Model. In the Export Model dialog box, click OK to accept the default variable name trainedModel. Look in the command window to see information about the results. 21 To visualize your decision tree model, enter: view(trainedModel.ClassificationTree,'Mode','graph')

23-80

Train Decision Trees Using Classification Learner App

22 You can use the exported classifier to make predictions on new data. For example, to make

predictions for the fishertable data in your workspace, enter: yfit = trainedModel.predictFcn(fishertable)

The output yfit contains a class prediction for each data point. 23 If you want to automate training the same classifier with new data, or learn how to

programmatically train classifiers, you can generate code from the app. To generate code for the best trained model, on the Classification Learner tab, in the Export section, click Generate Function. The app generates code from your model and displays the file in the MATLAB Editor. To learn more, see “Generate MATLAB Code to Train the Model with New Data” on page 23-70. This example uses Fisher's 1936 iris data. The iris data contains measurements of flowers: the petal length, petal width, sepal length, and sepal width for specimens from three species. Train a classifier to predict the species based on the predictor measurements. Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

Click the arrow on the far right of the Model Type section to expand the list of classifiers. 23-81

23

Classification Learner

2

Click All, then click Train.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples

23-82



“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-37



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69

Train Discriminant Analysis Classifiers Using Classification Learner App

Train Discriminant Analysis Classifiers Using Classification Learner App This example shows how to construct discriminant analysis classifiers in the Classification Learner app, using the fisheriris data set. You can use discriminant analysis with two or more classes in Classification Learner. 1

In MATLAB, load the fisheriris data set. fishertable = readtable('fisheriris.csv');

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Classification Learner tab, in the File section, click New Session > From Workspace.

In the New Session dialog box, select the table fishertable from the Data Set Variable list (if necessary). Observe that the app has selected response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections. 4

Click Start Session. Classification Learner creates a scatter plot of the data.

5

Use the scatter plot to visualize which variables are useful for predicting the response. Select different variables in the X- and Y-axis controls. Observe which variables separate the classes most clearly.

6

To train both nonoptimizable discriminant analysis classifiers, on the Classification Learner tab, in the Model Type section, click the down arrow to expand the list of classifiers, and under Discriminant Analysis, click All Discriminants.

Then click Train

.

Tip If you have Parallel Computing Toolbox then the first time you click Train you see a dialog while the app opens a parallel pool of workers. After the pool opens, you can train multiple classifiers at once and continue working. Classification Learner trains one of each classification option in the gallery, linear and quadratic discriminants, and highlights the best score. The app outlines in a box the Accuracy score of the best model.

23-83

23

Classification Learner

7

Select a model in the History list to view the results. Examine the scatter plot for the trained model and try plotting different predictors. Misclassified points are shown as an X.

8

To inspect the accuracy of the predictions in each class, on the Classification Learner tab, in the Plots section, click Confusion Matrix. View the matrix of true class and predicted class results.

9

Select the other model in the list to compare. For information on the strengths of different model types, see “Discriminant Analysis” on page 23-26.

10 Choose the best model in the History list (the best score is highlighted in a box). To improve the

model, try including different features in the model. See if you can improve the model by removing features with low predictive power. On the Classification Learner tab, in the Features section, click Feature Selection. In the Feature Selection dialog box, specify predictors to remove from the model, and click Train to train a new model using the new options. Compare results among the classifiers in the History list. 11 To investigate features to include or exclude, use the parallel coordinates plot. On the

Classification Learner tab, in the Plots section, select Parallel Coordinates Plot. 12 Choose the best model in the History list. To try to improve the model further, try changing

classifier settings. On the Classification Learner tab, in the Model Type section, click Advanced. Try changing a setting, then train the new model by clicking Train. For information on settings, see “Discriminant Analysis” on page 23-26. 23-84

Train Discriminant Analysis Classifiers Using Classification Learner App

13 To export the trained model to the workspace, select the Classification Learner tab and click

Export model. See “Export Classification Model to Predict New Data” on page 23-69. 14 To examine the code for training this classifier, click Generate Function.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

Click the arrow on the far right of the Model Type section to expand the list of classifiers.

2

Click All, then click Train.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-37



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69



“Train Decision Trees Using Classification Learner App” on page 23-74

23-85

23

Classification Learner

Train Logistic Regression Classifiers Using Classification Learner App This example shows how to construct logistic regression classifiers in the Classification Learner app, using the ionosphere data set that contains two classes. You can use logistic regression with two classes in Classification Learner. In the ionosphere data, the response variable is categorical with two levels: g represents good radar returns, and b represents bad radar returns. 1

In MATLAB, load the ionosphere data set and define some variables from the data set to use for a classification. load ionosphere ionosphere = array2table(X); ionosphere.Group = Y;

Alternatively, you can load the ionosphere data set and keep the X and Y data as separate variables. 2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Classification Learner tab, in the File section, click New Session > From Workspace.

In the New Session dialog box, select the table ionosphere from the Data Set Variable list. Observe that the app has selected Group for the response variable, and the rest as predictors. Group has two levels. Alternatively, if you kept your predictor data X and response variable Y as two separate variables, you can first select the matrix X from the Data Set Variable list. Then, under Response, click the From workspace option button and select Y from the list. The Y variable is the same as the Group variable. 4

Click Start Session. Classification Learner creates a scatter plot of the data.

5

Use the scatter plot to visualize which variables are useful for predicting the response. Select different variables in the X- and Y-axis controls. Observe which variables separate the class colors most clearly.

6

To train the logistic regression classifier, on the Classification Learner tab, in the Model Type section, click the down arrow to expand the list of classifiers, and under Logistic Regression Classifiers, click Logistic Regression.

Then click Train.

23-86

Train Logistic Regression Classifiers Using Classification Learner App

Tip If you have Parallel Computing Toolbox then the first time you click Train you see a dialog while the app opens a parallel pool of workers. After the pool opens, you can train multiple classifiers at once and continue working. Classification Learner trains the model. The app outlines in a box the Accuracy score of the best model (in this case, there is only one model).

7

Select the model in the History list to view the results. Examine the scatter plot for the trained model and try plotting different predictors. Misclassified points are shown as an X.

8

To inspect the accuracy of the predictions in each class, on the Classification Learner tab, in the Plots section, click Confusion Matrix. View the matrix of true class and predicted class results.

9

Choose the best model in the History list (the best score is highlighted in a box). To improve the model, try including different features in the model. See if you can improve the model by removing features with low predictive power. On the Classification Learner tab, in the Features section, click Feature Selection. In the Feature Selection dialog box, specify predictors to remove from the model, and click Train to train a new model using the new options. Compare results among the classifiers in the History list.

10 To investigate features to include or exclude, use the parallel coordinates plot. On the

Classification Learner tab, in the Plots section, select Parallel Coordinates Plot. 11 To export the trained model to the workspace, select the Classification Learner tab and click

Export model. See “Export Classification Model to Predict New Data” on page 23-69. 12 To examine the code for training this classifier, click Generate Function.

23-87

23

Classification Learner

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

Click the arrow on the far right of the Model Type section to expand the list of classifiers.

2

Click All, then click Train.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples

23-88



“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Logistic Regression” on page 23-27



“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-37



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69



“Train Decision Trees Using Classification Learner App” on page 23-74

Train Support Vector Machines Using Classification Learner App

Train Support Vector Machines Using Classification Learner App This example shows how to construct support vector machine (SVM) classifiers in the Classification Learner app, using the ionosphere data set that contains two classes. You can use a support vector machine (SVM) with two or more classes in Classification Learner. An SVM classifies data by finding the best hyperplane that separates all data points of one class from those of another class. In the ionosphere data, the response variable is categorical with two levels: g represents good radar returns, and b represents bad radar returns. 1

In MATLAB, load the ionosphere data set and define some variables from the data set to use for a classification. load ionosphere ionosphere = array2table(X); ionosphere.Group = Y;

Alternatively, you can load the ionosphere data set and keep the X and Y data as separate variables. 2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Classification Learner tab, in the File section, click New Session > From Workspace.

In the New Session dialog box, select the table ionosphere from the Data Set Variable list. Observe that the app has selected response and predictor variables based on their data type. The response variable Group has two levels. All the other variables are predictors. Alternatively, if you kept your predictor data X and response variable Y as two separate variables, you can first select the matrix X from the Data Set Variable list. Then, under Response, click the From workspace option button and select Y from the list. The Y variable is the same as the Group variable. 4

Click Start Session. Classification Learner creates a scatter plot of the data.

5

Use the scatter plot to visualize which variables are useful for predicting the response. Select different variables in the X- and Y-axis controls. Observe which variables separate the class colors most clearly.

6

To create a selection of SVM models, on the Classification Learner tab, in the Model Type section, click the down arrow to expand the list of classifiers, and under Support Vector Machines, click All SVMs.

Then click Train. 23-89

23

Classification Learner

Tip If you have Parallel Computing Toolbox then the first time you click Train you see a dialog while the app opens a parallel pool of workers. After the pool opens, you can train multiple classifiers at once and continue working. Classification Learner trains one of each nonoptimizable SVM classification option in the gallery, and highlights the best score. The app outlines in a box the Accuracy score of the best model.

7

Select a model in the History list to view the results. Examine the scatter plot for the trained model and try plotting different predictors. Misclassified points are shown as an X.

8

To inspect the accuracy of the predictions in each class, on the Classification Learner tab, in the Plots section, click Confusion Matrix. View the matrix of true class and predicted class results.

9

Select the other models in the list to compare.

10 Choose the best model in the History list (the best score is highlighted in a box). To improve the

model, try including different features in the model. See if you can improve the model by removing features with low predictive power. On the Classification Learner tab, in the Features section, click Feature Selection. In the Feature Selection dialog box, specify predictors to remove from the model, and click Train to train a new model using the new options. Compare results among the classifiers in the History list. 11 To investigate features to include or exclude, use the parallel coordinates plot. On the

Classification Learner tab, in the Plots section, select Parallel Coordinates Plot. 12 Choose the best model in the History list. To try to improve the model further, try changing SVM

settings. On the Classification Learner tab, in the Model Type section, click Advanced. Try changing a setting, then train the new model by clicking Train. For information on settings, see “Support Vector Machines” on page 23-27. 23-90

Train Support Vector Machines Using Classification Learner App

13 To export the trained model to the workspace, select the Classification Learner tab and click

Export model. See “Export Classification Model to Predict New Data” on page 23-69. 14 To examine the code for training this classifier, click Generate Function. For SVM models, see

also “Generate C Code for Prediction” on page 23-71. Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

Click the arrow on the far right of the Model Type section to expand the list of classifiers.

2

Click All, then click Train.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-37



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69



“Generate C Code for Prediction” on page 23-71



“Train Decision Trees Using Classification Learner App” on page 23-74

23-91

23

Classification Learner

Train Nearest Neighbor Classifiers Using Classification Learner App This example shows how to construct nearest neighbors classifiers in the Classification Learner app. 1

In MATLAB, load the fisheriris data set and define some variables from the data set to use for a classification. fishertable = readtable('fisheriris.csv');

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Classification Learner tab, in the File section, click New Session > From Workspace.

In the New Session dialog box, select the table fishertable from the Data Set Variable list (if necessary). Observe that the app has selected response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections. 4

Click Start Session. The app creates a scatter plot of the data.

5

Use the scatter plot to investigate which variables are useful for predicting the response. To visualize the distribution of species and measurements, select different options on the Variable on X axis and Variable on Y axis menus. Observe which variables separate the species colors most clearly.

6

To create a selection of nearest neighbors models, on the Classification Learner tab, on the far right of the Model Type section, click the arrow to expand the list of classifiers, and under Nearest Neighbor Classifiers, click All KNNs.

7

In the Training section, click Train. Tip If you have Parallel Computing Toolbox then the first time you click Train you see a dialog while the app opens a parallel pool of workers. After the pool opens, you can train multiple classifiers at once and continue working. Classification Learner trains one of each nonoptimizable nearest neighbor classification option in the gallery, and highlights the best score. The app outlines in a box the Accuracy score of the best model. 23-92

Train Nearest Neighbor Classifiers Using Classification Learner App

8

Select a model in the History list to view the results. Examine the scatter plot for the trained model. An X indicates a misclassified point.

9

To inspect the accuracy of the predictions in each class, on the Classification Learner tab, in the Plots section, click Confusion Matrix. View the matrix of true class and predicted class results.

10 Select the other models in the list to compare. 11 Choose the best model in the History list (the best score is highlighted in a box). To improve the

model, try including different features in the model. See if you can improve the model by removing features with low predictive power. On the Classification Learner tab, in the Features section, click Feature Selection. In the Feature Selection dialog box, select predictors to remove from the model, and click Train to train a new model using the new options. Compare results among the classifiers in the History list. 12 To investigate features to include or exclude, use the parallel coordinates plot. On the

Classification Learner tab, in the Plots section, select Parallel Coordinates Plot. 13 Choose the best model in the History list. To try to improve the model further, try changing

settings. On the Classification Learner tab, in the Model Type section, click Advanced. Try changing a setting, and then train the new model by clicking Train. For information on settings and the strengths of different nearest neighbor model types, see “Nearest Neighbor Classifiers” on page 23-30. 14 To export the trained model to the workspace, in the Export section of the toolstrip, click Export

model. See “Export Classification Model to Predict New Data” on page 23-69. 15 To examine the code for training this classifier, click Generate Function.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. 23-93

23

Classification Learner

To try all the nonoptimizable classifier model presets available for your data set: 1

Click the arrow on the far right of the Model Type section to expand the list of classifiers.

2

Click All, then click Train.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples

23-94



“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-37



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69



“Train Decision Trees Using Classification Learner App” on page 23-74

Train Ensemble Classifiers Using Classification Learner App

Train Ensemble Classifiers Using Classification Learner App This example shows how to construct ensembles of classifiers in the Classification Learner app. Ensemble classifiers meld results from many weak learners into one high-quality ensemble predictor. Qualities depend on the choice of algorithm, but ensemble classifiers tend to be slow to fit because they often need many learners. 1

In MATLAB, load the fisheriris data set and define some variables from the data set to use for a classification. fishertable = readtable('fisheriris.csv');

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Classification Learner tab, in the File section, click New Session > From Workspace.

In the New Session dialog box, select the table fishertable from the Data Set Variable list (if necessary). Observe that the app has selected response and predictor variables based on their data type. Petal and sepal length and width are predictors. Species is the response that you want to classify. For this example, do not change the selections. 4

Click Start Session. Classification Learner creates a scatter plot of the data.

5

Use the scatter plot to investigate which variables are useful for predicting the response. Select different variables in the X- and Y-axis controls to visualize the distribution of species and measurements. Observe which variables separate the species colors most clearly.

6

To create a selection of ensemble models, on the Classification Learner tab, in the Model Type section, click the down arrow to expand the list of classifiers, then under Ensemble Classifiers, click All Ensembles.

7

In the Training section, click Train. Tip If you have Parallel Computing Toolbox then the first time you click Train you see a dialog while the app opens a parallel pool of workers. After the pool opens, you can train multiple classifiers at once and continue working.

23-95

23

Classification Learner

Classification Learner trains one of each nonoptimizable ensemble classification option in the gallery, and highlights the best score. The app outlines in a box the Accuracy score of the best model. 8

Select a model in the History list to view the results. Examine the scatter plot for the trained model. Misclassified points are shown as an X.

9

To inspect the accuracy of the predictions in each class, on the Classification Learner tab, in the Plots section, click Confusion Matrix. View the matrix of true class and predicted class results.

10 Select the other models in the list to compare. 11 Choose the best model in the History list (the best score is highlighted in the Accuracy box). To

improve the model, try including different features in the model. See if you can improve the model by removing features with low predictive power. On the Classification Learner tab, in the Features section, click Feature Selection. In the Feature Selection dialog box, specify predictors to remove from the model, and click Train to train a new model using the new options. Compare results among the classifiers in the History list. 12 To investigate features to include or exclude, use the scatter and parallel coordinates plots. On

the Classification Learner tab, in the Plots section, select Parallel Coordinates Plot. 13 Choose the best model in the History list. To try to improve the model further, try changing

settings. On the Classification Learner tab, in the Model Type section, click Advanced. Try changing a setting, then train the new model by clicking Train. For information on the settings to try and the strengths of different ensemble model types, see “Ensemble Classifiers” on page 23-33. 14 To export the trained model to the workspace, select the Classification Learner tab and click

Export model. See “Export Classification Model to Predict New Data” on page 23-69. 23-96

Train Ensemble Classifiers Using Classification Learner App

15 To examine the code for training this classifier, click Generate Function.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set: 1

Click the arrow on the far right of the Model Type section to expand the list of classifiers.

2

Click All, then click Train.

To learn about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-37



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69



“Train Decision Trees Using Classification Learner App” on page 23-74

23-97

23

Classification Learner

Train Naive Bayes Classifiers Using Classification Learner App This example shows how to create and compare different naive Bayes classifiers using the Classification Learner app, and export trained models to the workspace to make predictions for new data. Naive Bayes classifiers leverage Bayes theorem and make the assumption that predictors are independent of one another within each class. However, the classifiers appear to work well even when the independence assumption is not valid. You can use naive Bayes with two or more classes in Classification Learner. The app allows you to train a Gaussian naive Bayes model or a kernel naive Bayes model individually or simultaneously. This table lists the available naive Bayes models in Classification Learner and the probability distributions used by each model to fit predictors. Model

Numerical Predictor

Categorical Predictor

Gaussian naive Bayes

Gaussian distribution (or normal multivariate multinomial distribution) distribution

Kernel naive Bayes

Kernel distribution You can specify the kernel type and support. Classification Learner automatically determines the kernel width using the underlying fitcnb function.

multivariate multinomial distribution

This example uses Fisher's iris data set, which contains measurements of flowers (petal length, petal width, sepal length, and sepal width) for specimens from three species. Train naive Bayes classifiers to predict the species based on the predictor measurements. 1

In the MATLAB Command Window, load the Fisher iris data set and create a table of measurement predictors (or features) using variables from the data set. fishertable = readtable('fisheriris.csv');

2

Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Classification Learner tab, in the File section, select New Session > From Workspace.

4

In the New Session dialog box, select the table fishertable from the Data Set Variable list (if necessary). As shown in the dialog box, the app selects the response and predictor variables based on their data type. Petal and sepal length and width are predictors, and species is the response that you want to classify. For this example, do not change the selections.

23-98

Train Naive Bayes Classifiers Using Classification Learner App

5

To accept the default validation scheme and continue, click Start Session. The default validation option is cross-validation, to protect against overfitting. Classification Learner creates a scatter plot of the data.

23-99

23

Classification Learner

6

Use the scatter plot to investigate which variables are useful for predicting the response. Select different options on the X and Y lists under Predictors to visualize the distribution of species and measurements. Observe which variables separate the species colors most clearly. The setosa species (blue points) is easy to separate from the other two species with all four predictors. The versicolor and virginica species are much closer together in all predictor measurements and overlap, especially when you plot sepal length and width. setosa is easier to predict than the other two species.

7

Create a naive Bayes model. On the Classification Learner tab, in the Model Type section, click the arrow to open the gallery. In the Naive Bayes Classifiers group, click Gaussian Naive Bayes. Note that Classification Learner disables the Advanced button in the Model Type section, because this type of model has no advanced settings.

In the Training section, click Train. Tip If you have Parallel Computing Toolbox the Opening Pool dialog box opens the first time you click Train (or when you click Train again after an extended period of time). The dialog box remains open while the app opens a parallel pool of workers. During this time, you cannot

23-100

Train Naive Bayes Classifiers Using Classification Learner App

interact with the software. After the pool opens, you can train multiple classifiers at once and continue working. The app creates a Gaussian naive Bayes model, and plots the results. The app display the Gaussian Naive Bayes model in the History list. Check the model validation score in the Accuracy box. The score shows that the model performs well. For the Gaussian Naive Bayes model, by default, the app models the distribution of numerical predictors using the Gaussian distribution, and models the distribution of categorical predictors using the multivariate multinomial distribution (MVMN).

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 8

Examine the scatter plot. An X indicates misclassified points. The blue points (setosa species) are all correctly classified, but the other two species have misclassified points. Under Plot, switch between the Data and Model predictions options. Observe the color of the incorrect (X) points. Or, to view only the incorrect points, clear the Correct check box.

9

Train a kernel naive Bayes model for comparison. On the Classification Learner tab, in the Model Type section, click Kernel Naive Bayes. Note that Classification Learner enables the Advanced button, because this type of model has advanced settings.

23-101

23

Classification Learner

The app displays a draft kernel naive Bayes model in the History list. In the Model Type section, click Advanced to change settings in the Advanced Naive Bayes Options menu. Select Triangle from the Kernel Type list, and select Positive from the Support list.

Note The settings in the Advanced Naive Bayes Options menu are available for continuous data only. Pointing to Kernel Type displays the tooltip "Specify Kernel smoothing function for continuous variables," and pointing to Support displays the tooltip "Specify Kernel smoothing density support for continuous variables." In the Training section, click Train to train the new model.

The History list now includes the new kernel naive Bayes model. Its model validation score is better than the score for the Gaussian naive Bayes model. The app highlights the Accuracy score of the best model by outlining it in a box. 10 In the History list, click each model to view and compare the results.

23-102

Train Naive Bayes Classifiers Using Classification Learner App

11 Train a Gaussian naive Bayes model and a kernel naive Bayes model simultaneously. On the

Classification Learner tab, in the Model Type section, click All Naive Bayes. Classification Learner disables the Advanced button. In the Training section, click Train.

The app trains one of each naive Bayes model type and highlights the Accuracy score of the best model or models.

12 In the History list, click a model to view the results. Examine the scatter plot for the trained

model and try plotting different predictors. Misclassified points appear as an X. 13 To inspect the accuracy of the predictions in each class, on the Classification Learner tab, in

the Plots section, click Confusion Matrix. The app displays a matrix of true class and predicted class results.

23-103

23

Classification Learner

Note Validation introduces some randomness into the results. Your confusion matrix results can vary from the results shown in this example. 14 In the History list, click the other models and compare their results. 15 In the History list, click the model with the highest Accuracy score. To improve the model, try

modifying its features. For example, see if you can improve the model by removing features with low predictive power. On the Classification Learner tab, in the Features section, click Feature Selection. In the Feature Selection menu, clear the check boxes for PetalLength and PetalWidth to exclude them from the predictors. A new draft model (model 4) appears in the History list with the new settings (2/4 features), based on the kernel naive Bayes model (model 3.2 in the History list).

23-104

Train Naive Bayes Classifiers Using Classification Learner App

In the Training section, click Train to train a new kernel naive Bayes model using the new predictor options. The History list now includes model 4. It is also a kernel naive Bayes model, trained using only 2 of 4 predictors. 16 To determine which predictors are included, click a model in the History list, then click Feature

Selection in the Features section and note which check boxes are selected. The model with only sepal measurements (model 4) has a much lower Accuracy score than the models containing all predictors. 17 Train another kernel naive Bayes model including only the petal measurements. Change the

selections in the Feature Selection menu and click Train.

23-105

23

Classification Learner

The model trained using only petal measurements (model 5) performs comparably to the model containing all predictors. The models predict no better using all the measurements compared to only the petal measurements. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors. 18 To investigate features to include or exclude, use the parallel coordinates plot. On the

Classification Learner tab, in the Plots section, click Parallel Coordinates Plot. 19 In the History list, click the model with the highest Accuracy score. To improve the model

further, try changing naive Bayes settings (if available). On the Classification Learner tab, in the Model Type section, click Advanced. Recall that the Advanced button is enabled only for some models. Change a setting, then train the new model by clicking Train. 20 Export the trained model to the workspace. On the Classification Learner tab, in the Export

section, select Export Model > Export Model. See “Export Classification Model to Predict New Data” on page 23-69. 21 Examine the code for training this classifier. In the Export section, click Generate Function.

Use the same workflow to evaluate and compare the other classifier types you can train in Classification Learner. To try all the nonoptimizable classifier model presets available for your data set:

23-106

1

Click the arrow on the Model Type section to open the gallery of classifiers.

2

In the Get Started group, click All, then click Train in the Training section.

Train Naive Bayes Classifiers Using Classification Learner App

For information about other classifier types, see “Train Classification Models in Classification Learner App” on page 23-10.

See Also Related Examples •

“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Naive Bayes Classification” on page 21-2



“Feature Selection and Feature Transformation Using Classification Learner App” on page 23-37



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69

23-107

23

Classification Learner

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App This example shows how to create and compare classifiers that use specified misclassification costs in the Classification Learner app. Specify the misclassification costs before training, and use the accuracy and total misclassification cost results to compare the trained models. 1

In the MATLAB Command Window, read the sample file CreditRating_Historical.dat into a table. The predictor data consists of financial ratios and industry sector information for a list of corporate customers. The response variable consists of credit ratings assigned by a rating agency. Combine all the A ratings into one rating. Do the same for the B and C ratings, so that the response variable has three distinct ratings. Among the three ratings, A is considered the best and C the worst. creditrating = readtable('CreditRating_Historical.dat'); Rating = categorical(creditrating.Rating); Rating = mergecats(Rating,{'AAA','AA','A'},'A'); Rating = mergecats(Rating,{'BBB','BB','B'},'B'); Rating = mergecats(Rating,{'CCC','CC','C'},'C'); creditrating.Rating = Rating;

2

Assume these are the costs associated with misclassifying the credit ratings of customers. Customer Predicted Rating Customer True Rating

A

B

C

A

$0

$100

$200

B

$500

$0

$100

C

$1000

$500

$0

For example, the cost of misclassifying a C rating customer as an A rating customer is $1000. The costs indicate that classifying a customer with bad credit as a customer with good credit is more costly than classifying a customer with good credit as a customer with bad credit. Create a matrix variable that contains the misclassification costs. Create another variable that specifies the class names and their order in the matrix variable. ClassificationCosts = [0 100 200; 500 0 100; 1000 500 0]; ClassNames = categorical({'A','B','C'});

Tip Alternatively, you can specify misclassification costs directly inside the Classification Learner app. See “Specify Misclassification Costs” on page 23-42 for more information.

23-108

3

Open Classification Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

4

On the Classification Learner tab, in the File section, select New Session > From Workspace.

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App

5

In the New Session dialog box, select the table creditrating from the Data Set Variable list. As shown in the dialog box, the app selects the response and predictor variables based on their data type. The default response variable is the Rating variable. The default validation option is cross-validation, to protect against overfitting. For this example, do not change the default settings.

6

To accept the default settings, click Start Session.

7

Specify the misclassification costs. On the Classification Learner tab, in the Options section, click Misclassification Costs. The app opens a dialog box showing the default misclassification costs. In the dialog box, click Import from Workspace.

23-109

23

Classification Learner

In the import dialog box, select ClassificationCosts as the cost variable and ClassNames as the class order in the cost variable. Click Import.

The app updates the values in the misclassification costs dialog box. Click OK to save your changes.

23-110

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App

8

Train fine, medium, and coarse trees simultaneously. On the Classification Learner tab, in the Model Type section, click the arrow to open the gallery. In the Decision Trees group, click All Trees. In the Training section, click Train. The app trains one of each tree model type and displays the models in the History list. Tip If you have Parallel Computing Toolbox, the Opening Pool dialog box opens the first time you click Train (or when you click Train again after an extended period of time). The dialog box remains open while the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can train multiple classifiers simultaneously and continue working.

23-111

23

Classification Learner

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 9

In the History list, click a model to view the results, which are displayed in the Current Model pane. Each model has a validation Accuracy score that indicates the percentage of correctly predicted responses. In the History list, the app highlights the highest Accuracy score by outlining it in a box.

10 Inspect the accuracy of the predictions in each class. On the Classification Learner tab, in the

Plots section, click Confusion Matrix. The app displays a matrix of true class and predicted class results for the selected model (in this case, for the medium tree).

23-112

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App

11 You can also plot results per predicted class to investigate false discovery rates. Under Plot,

select the Positive Predictive Values (PPV) False Discovery Rates (FDR) option. In the confusion matrix for the medium tree, the entries below the diagonal have small percentage values. These values indicate that the model tries to avoid assigning a credit rating that is higher than the true rating for a customer.

23-113

23

Classification Learner

12 Compare the total misclassification costs of the tree models. To inspect the total misclassification

cost of a model, select the model in the History list, and then view the Results section of the Current Model pane. For example, the medium tree has these results.

23-114

Train and Compare Classifiers Using Misclassification Costs in Classification Learner App

In general, choose a model that has high accuracy and low total misclassification cost. In this example, the medium tree has the highest validation accuracy value and the lowest total misclassification cost of the three models. You can perform feature selection and transformation or tune your model just as you do in the workflow without misclassification costs. However, always check the total misclassification cost of your model when assessing its performance. For differences in the exported model and exported code when you use misclassification costs, see “Misclassification Costs in Exported Model and Generated Code” on page 23-46.

See Also Related Examples •

“Misclassification Costs in Classification Learner App” on page 23-42



“Train Classification Models in Classification Learner App” on page 23-10

23-115

23

Classification Learner

Train Classifier Using Hyperparameter Optimization in Classification Learner App This example shows how to tune hyperparameters of a classification support vector machine (SVM) model by using hyperparameter optimization in the Classification Learner app. Compare the test set performance of the trained optimizable SVM to that of the best-performing preset SVM model. 1

In the MATLAB Command Window, load the ionosphere data set, and create a table containing the data. Separate the table into training and test sets. load ionosphere tbl = array2table(X); tbl.Y = Y; rng('default') % For reproducibility of the data split partition = cvpartition(Y,'Holdout',0.15); idxTrain = training(partition); % Indices for the training set tblTrain = tbl(idxTrain,:); tblTest = tbl(~idxTrain,:);

2

Open Classification Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Classification Learner.

3

On the Classification Learner tab, in the File section, select New Session > From Workspace.

4

In the New Session dialog box, select the tblTrain table from the Data Set Variable list. As shown in the dialog box, the app selects the response and predictor variables. The default response variable is Y. The default validation option is 5-fold cross-validation, to protect against overfitting. For this example, do not change the default settings.

23-116

Train Classifier Using Hyperparameter Optimization in Classification Learner App

5

To accept the default options and continue, click Start Session.

6

Train all preset SVM models. On the Classification Learner tab, in the Model Type section, click the arrow to open the gallery. In the Support Vector Machines group, click All SVMs. In the Training section, click Train. The app trains one of each SVM model type and displays the models in the History list. Tip If you have Parallel Computing Toolbox, the Opening Pool dialog box opens the first time you click Train (or when you click Train again after an extended period of time). The dialog box remains open while the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can train multiple models simultaneously and continue working.

23-117

23

Classification Learner

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.

23-118

7

Select an optimizable SVM model to train. On the Classification Learner tab, in the Model Type section, click the arrow to open the gallery. In the Support Vector Machines group, click Optimizable SVM. The app disables the Use Parallel button when you select an optimizable model.

8

Select the model hyperparameters to optimize. In the Model Type section, select Advanced > Advanced. The app opens a dialog box in which you can select Optimize check boxes for the hyperparameters that you want to optimize. By default, all the check boxes for the available hyperparameters are selected. For this example, clear the Optimize check boxes for Kernel function and Standardize data. By default, the app disables the Optimize check box for Kernel scale whenever the kernel function has a fixed value other than Gaussian. Select a Gaussian kernel function, and select the Optimize check box for Kernel scale.

Train Classifier Using Hyperparameter Optimization in Classification Learner App

9

In the Training section, click Train.

10 The app displays a Minimum Classification Error Plot as it runs the optimization process. At

each iteration, the app tries a different combination of hyperparameter values and updates the plot with the minimum validation classification error observed up to that iteration, indicated in dark blue. When the app completes the optimization process, it selects the set of optimized hyperparameters, indicated by a red square. For more information, see “Minimum Classification Error Plot” on page 23-55. The app lists the optimized hyperparameters in both the upper right of the plot and the Optimized Hyperparameters section of the Current Model pane.

23-119

23

Classification Learner

Note In general, the optimization results are not reproducible. 11 Compare the trained preset SVM models to the trained optimizable model. In the History list,

the app highlights the highest validation Accuracy by outlining it in a box. In this example, the trained optimizable SVM model outperforms the six preset models. A trained optimizable model does not always have a higher accuracy than the trained preset models. If a trained optimizable model does not perform well, you can try to get better results by running the optimization for longer. In the Model Type section, select Advanced > Optimizer Options. In the dialog box, increase the Iterations value. For example, you can double-click the default value of 30 and enter a value of 60. 12 Because hyperparameter tuning often leads to overfitted models, check the test set performance

of the SVM model with the optimized hyperparameters and compare it to the performance of the best preset SVM model. Begin by exporting the two models to the MATLAB workspace. • In the History list, select the Medium Gaussian SVM model. On the Classification Learner tab, in the Export section, select Export Model > Export Model. In the dialog box, name the model gaussianSVM. • In the History list, select the Optimizable SVM model. On the Classification Learner tab, in the Export section, select Export Model > Export Model. In the dialog box, name the model optimizableSVM.

23-120

Train Classifier Using Hyperparameter Optimization in Classification Learner App

13 Compute the accuracy of the two models on the tblTest data. In the MATLAB Command

Window, use the predictFcn function in each exported model structure to predict the response values of the test set data. Then, use confusion matrices to visualize the results. Compute and compare the accuracy values for the models on the test set data. testY = tblTest.Y; labels = gaussianSVM.predictFcn(tblTest); figure cm = confusionchart(testY,labels); title('Preset Model Results') optLabels = optimizableSVM.predictFcn(tblTest); figure optcm = confusionchart(testY,optLabels); title('Optimizable Model Results') cmvalues = cm.NormalizedValues; optcmvalues = optcm.NormalizedValues; presetAccuracy = sum(diag(cmvalues))/sum(cmvalues,'all')*100 optAccuracy = sum(diag(optcmvalues))/sum(optcmvalues,'all')*100 presetAccuracy = 92.3077 optAccuracy = 88.4615

23-121

23

Classification Learner

23-122

Train Classifier Using Hyperparameter Optimization in Classification Learner App

In this example, the trained optimizable SVM model does not perform as well as the trained preset model on the test set data.

See Also Related Examples •

“Hyperparameter Optimization in Classification Learner App” on page 23-48



“Train Classification Models in Classification Learner App” on page 23-10



“Select Data and Validation for Classification Problem” on page 23-16



“Choose Classifier Options” on page 23-20



“Assess Classifier Performance in Classification Learner” on page 23-59



“Export Classification Model to Predict New Data” on page 23-69



“Bayesian Optimization Workflow” on page 10-25

23-123

24 Regression Learner • “Train Regression Models in Regression Learner App” on page 24-2 • “Select Data and Validation for Regression Problem” on page 24-8 • “Choose Regression Model Options” on page 24-12 • “Feature Selection and Feature Transformation Using Regression Learner App” on page 24-24 • “Hyperparameter Optimization in Regression Learner App” on page 24-27 • “Assess Model Performance in Regression Learner” on page 24-39 • “Export Plots in Regression Learner App” on page 24-46 • “Export Regression Model to Predict New Data” on page 24-50 • “Train Regression Trees Using Regression Learner App” on page 24-54 • “Train Regression Model Using Hyperparameter Optimization in Regression Learner App” on page 24-63

24

Regression Learner

Train Regression Models in Regression Learner App In this section... “Automated Regression Model Training” on page 24-2 “Manual Regression Model Training” on page 24-4 “Parallel Regression Model Training” on page 24-4 “Compare and Improve Regression Models” on page 24-5 You can use Regression Learner to train regression models including linear regression models, regression trees, Gaussian process regression models, support vector machines, and ensembles of regression trees. In addition to training models, you can explore your data, select features, specify validation schemes, and evaluate results. You can export a model to the workspace to use the model with new data or generate MATLAB code to learn about programmatic classification. Training a model in Regression Learner consists of two parts: • Validated Model: Training a model with a validation scheme. By default, the app protects against overfitting by applying cross-validation. Alternatively, you can choose holdout validation. The validated model is visible in the app. • Full Model: Training a model on full data without validation. The app trains this model simultaneously with the validated model. However, the model trained on full data is not visible in the app. When you choose a regression model to export to the workspace, Regression Learner exports the full model. The app displays the results of the validated model. Diagnostic measures, such as model accuracy, and plots, such as response plot or residuals plot reflect the validated model results. You can automatically train one or more regression models, compare validation results, and choose the best model that works for your regression problem. When you choose a model to export to the workspace, Regression Learner exports the full model. Because Regression Learner creates a model object of the full model during training, you experience no lag time when you export the model. You can use the exported model to make predictions on new data. To get started by training a selection of model types, see “Automated Regression Model Training” on page 24-2. If you already know which regression model you want to train, see “Manual Regression Model Training” on page 24-4.

Automated Regression Model Training You can use Regression Learner to automatically train a selection of different regression models on your data. • Get started by automatically training multiple models simultaneously. You can quickly try a selection of models, and then explore promising models interactively. • If you already know what model type you want, then you can train individual models instead. See “Manual Regression Model Training” on page 24-4.

24-2

1

On the Apps tab, in the Machine Learning group, click Regression Learner.

2

Click New Session and select data from the workspace or from file. Specify a response variable and variables to use as predictors. See “Select Data and Validation for Regression Problem” on page 24-8.

Train Regression Models in Regression Learner App

3

On the Regression Learner tab, in the Model Type section, click the arrow to expand the list of regression models. Select All Quick-To-Train are fast to fit.

. This option trains all the model presets that

4

Click Train

.

Note If you have Parallel Computing Toolbox, the app trains models in parallel. See “Parallel Regression Model Training” on page 24-4. A selection of model types appears in the History list. When the models finish training, the best RMSE score is highlighted in a box.

5

Click models in the History list to explore results in the plots. For the next steps, see “Manual Regression Model Training” on page 24-4 or “Compare and Improve Regression Models” on page 24-5.

6

To try all the nonoptimizable model presets available, click All

, and then click Train.

24-3

24

Regression Learner

Manual Regression Model Training To explore individual model types, you can train models one at a time or train a group of models of the same type. 1

Choose a model type. On the Regression Learner tab, in the Model Type section, click a model type. To see all available model options, click the arrow in the Model Type section to expand the list of regression models. The nonoptimizable model options in the gallery are preset starting points with different settings, suitable for a range of different regression problems. To read descriptions of the models, switch to the details view or hover the mouse over a button to display its tooltip.

For more information on each option, see “Choose Regression Model Options” on page 24-12. 2

After selecting a model, click Train

.

Repeat to explore different models. Tip Select regression trees first. If your trained models do not predict the response accurately enough, then try other models with higher flexibility. To avoid overfitting, look for a less flexible model that provides sufficient accuracy. 3

If you want to try all nonoptimizable models of the same or different types, then select one of the All options in the gallery. Alternatively, if you want to automatically tune hyperparameters of a specific model type, select the corresponding Optimizable model and perform hyperparameter optimization. For more information, see “Hyperparameter Optimization in Regression Learner App” on page 24-27.

For next steps, see “Compare and Improve Regression Models” on page 24-5.

Parallel Regression Model Training You can train models in parallel using Regression Learner if you have Parallel Computing Toolbox. When you train models, the app automatically starts a parallel pool of workers, unless you turn off the default parallel preference Automatically create a parallel pool. If a pool is already open, the app 24-4

Train Regression Models in Regression Learner App

uses it for training. Parallel training allows you to train multiple models simultaneously and continue working. 1

The first time you click Train, you see a dialog box while the app opens a parallel pool of workers. After the pool opens, you can train multiple models at once.

2

When models are training in parallel, you see progress indicators on each training and queued model in the History list. If you want, you can cancel individual models. During training, you can examine results and plots from models, and initiate training of more models.

To control parallel training, toggle the Use Parallel button on the app toolstrip. (The Use Parallel button is only available if you have Parallel Computing Toolbox.)

If you have Parallel Computing Toolbox, then parallel training is available in Regression Learner, and you do not need to set the UseParallel option of the statset function. If you turn off the parallel preference to Automatically create a parallel pool, then the app does not start a pool for you without asking first. Note You cannot perform hyperparameter optimization in parallel. The app disables the Use Parallel button when you select an optimizable model. If you then select a nonoptimizable model, the button is off by default.

Compare and Improve Regression Models 1

Click models in the History list to explore the results in the plots. Compare model performance by inspecting results in the plots. Examine the RMSE score reported in the History list for each model. See “Assess Model Performance in Regression Learner” on page 24-39.

2

Select the best model in the History list and then try including and excluding different features in

the model. Click Feature Selection

.

Try the response plot to help you identify features to remove. See if you can improve the model by removing features with low predictive power. Specify predictors to include in the model, and train new models using the new options. Compare results among the models in the History list.

You also can try transforming features with PCA to reduce dimensionality. See “Feature Selection and Feature Transformation Using Regression Learner App” on page 2424. 24-5

24

Regression Learner

3

Improve the model further by changing model parameter settings in the Advanced dialog box. Then, train using the new options. To learn how to control model flexibility, see “Choose Regression Model Options” on page 24-12. For information on how to tune model parameter settings automatically, see “Hyperparameter Optimization in Regression Learner App” on page 24-27. If feature selection, PCA, or new parameter settings improve your model, try training All model types with the new settings. See if another model type does better with the new settings.

Tip To avoid overfitting, look for a less flexible model that provides sufficient accuracy. For example, look for simple models, such as regression trees that are fast and easy to interpret. If your models are not accurate enough, then try other models with higher flexibility, such as ensembles. To learn about the model flexibility, see “Choose Regression Model Options” on page 24-12. This figure shows the app with a History list containing various regression model types.

Tip For a step-by-step example comparing different regression models, see “Train Regression Trees Using Regression Learner App” on page 24-54.

24-6

Train Regression Models in Regression Learner App

Next, you can generate code to train the model with different data or export trained models to the workspace to make predictions using new data. See “Export Regression Model to Predict New Data” on page 24-50.

See Also Related Examples •

“Select Data and Validation for Regression Problem” on page 24-8



“Choose Regression Model Options” on page 24-12



“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-24



“Assess Model Performance in Regression Learner” on page 24-39



“Export Regression Model to Predict New Data” on page 24-50



“Train Regression Trees Using Regression Learner App” on page 24-54

24-7

24

Regression Learner

Select Data and Validation for Regression Problem In this section... “Select Data from Workspace” on page 24-8 “Import Data from File” on page 24-9 “Example Data for Regression” on page 24-9 “Choose Validation Scheme” on page 24-10

Select Data from Workspace Tip In Regression Learner, tables are the easiest way to work with your data, because they can contain numeric and label data. Use the Import Tool to bring your data into the MATLAB workspace as a table, or use the table functions to create a table from workspace variables. See “Tables” (MATLAB). 1

Load your data into the MATLAB workspace. Predictor variables can be numeric, categorical, string, or logical vectors, cell arrays of character vectors, or character arrays. The response variable must be a floating-point vector (single or double precision). Combine the predictor data into one variable, either a table or a matrix. You can additionally combine your predictor data and response variable, or you can keep them separate. For example data sets, see “Example Data for Regression” on page 24-9.

2

On the Apps tab, click Regression Learner to open the app.

3

On the Regression Learner tab, in the File section, click New Session > From Workspace.

4

In the New Session dialog box, under Data Set Variable, select a table or matrix from the workspace variables. If you select a matrix, choose whether to use rows or columns for observations by clicking the option buttons.

5

Under Response, observe the default response variable. The app tries to select a suitable response variable from the data set variable and treats all other variables as predictors. If you want to use a different response variable, you can: • Use the list to select another variable from the data set variable. • Select a separate workspace variable by clicking the From workspace option button and then selecting a variable from the list.

24-8

6

Under Predictors, add or remove predictors using the check boxes. Add or remove all predictors by clicking Add All or Remove All. You can also add or remove multiple predictors by selecting them in the table, and then clicking Add N or Remove N, where N is the number of selected predictors. The Add All and Remove All buttons change to Add N and Remove N when you select multiple predictors.

7

Click Start Session to accept the default validation scheme and continue. The default validation option is 5-fold cross-validation, which protects against overfitting.

Select Data and Validation for Regression Problem

Tip If you have a large data set, you might want to switch to holdout validation. To learn more, see “Choose Validation Scheme” on page 24-10. For next steps, see “Train Regression Models in Regression Learner App” on page 24-2.

Import Data from File 1

On the Regression Learner tab, in the File section, select New Session > From File.

2

Select a file type in the list, such as spreadsheets, text files, or comma-separated values (.csv) files, or select All Files to browse for other file types such as .dat.

Example Data for Regression To get started using Regression Learner, try these example data sets. Name

Size

Description

Cars

Number of predictors: 7 Number of observations: 406 Response: MPG (miles per gallon)

Data on different car models, 1970– 1982. Predict the fuel economy (in miles per gallon), or one of the other characteristics. For a step-by-step example, see “Train Regression Trees Using Regression Learner App” on page 24-54.

Create a table from variables in the carbig.mat file: load carbig cartable = table(Acceleration, Cylinders, Displacement,... Horsepower, Model_Year, Weight, Origin, MPG);

Abalone

Number of predictors: 8 Number of observations: 4177 Response: Rings

Measurements of abalone (a group of sea snails). Predict the age of abalones, which is closely related to the number of rings in their shells.

Download the data from the UCI Machine Learning Repository and save it in your current folder. Read the data into a table and specify the variable names.

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.d websave('abalone.csv',url); varnames = {'Sex'; 'Length'; 'Diameter'; 'Height'; 'Whole_weight';... 'Shucked_weight'; 'Viscera_weight'; 'Shell_weight'; 'Rings'}; abalonetable = readtable('abalone.csv'); abalonetable.Properties.VariableNames = varnames;

Hospital

Number of predictors: 5 Number of observations: 100 Response: BloodPressure_2

Simulated hospital data. Predict the blood pressure of patients.

Create a table from the hospital variable in the hospital.mat file: load hospital.mat hospitaltable = dataset2table(hospital(:,2:end-1));

24-9

24

Regression Learner

Choose Validation Scheme Choose a validation method to examine the predictive accuracy of the fitted models. Validation estimates model performance on new data, and helps you choose the best model. Validation protects against overfitting. A model that is too flexible and suffers from overfitting has a worse validation accuracy. Choose a validation scheme before training any models so that you can compare all the models in your session using the same validation scheme. Tip Try the default validation scheme and click Start Session to continue. The default option is 5fold cross-validation, which protects against overfitting. If you have a large data set and training the models takes too long using cross-validation, reimport your data and try the faster holdout validation instead. • Cross-Validation: Select the number of folds (or divisions) to partition the data set using the slider control. If you choose k folds, then the app: 1

Partitions the data into k disjoint sets or folds

2

For each fold:

3

a

Trains a model using the out-of-fold observations

b

Assesses model performance using in-fold data

Calculates the average test error over all folds

This method gives a good estimate of the predictive accuracy of the final model trained using the full data set. The method requires multiple fits, but makes efficient use of all the data, so it works well for small data sets. • Holdout Validation: Select a percentage of the data to use as a validation set using the slider control. The app trains a model on the training set and assesses its performance with the validation set. The model used for validation is based on only a portion of the data, so holdout validation is appropriate only for large data sets. The final model is trained using the full data set. • No Validation: No protection against overfitting. The app uses all the data for training and computes the error rate on the same data. Without any test data, you get an unrealistic estimate of the model’s performance on new data. That is, the training sample accuracy is likely to be unrealistically high, and the predictive accuracy is likely to be lower. To help you avoid overfitting to the training data, choose a validation scheme instead. Note The validation scheme only affects the way that Regression Learner computes validation metrics. The final model is always trained using the full data set. All the models you train after selecting data use the same validation scheme that you select in this dialog box. You can compare all the models in your session using the same validation scheme. To change the validation selection and train new models, you can select data again, but you lose any trained models. The app warns you that importing data starts a new session. Save any trained models you want to keep to the workspace, and then import the data. 24-10

Select Data and Validation for Regression Problem

For next steps training models, see “Train Regression Models in Regression Learner App” on page 24-2.

See Also Related Examples •

“Train Regression Models in Regression Learner App” on page 24-2



“Choose Regression Model Options” on page 24-12



“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-24



“Assess Model Performance in Regression Learner” on page 24-39



“Export Regression Model to Predict New Data” on page 24-50



“Train Regression Trees Using Regression Learner App” on page 24-54

24-11

24

Regression Learner

Choose Regression Model Options In this section... “Choose Regression Model Type” on page 24-12 “Linear Regression Models” on page 24-14 “Regression Trees” on page 24-15 “Support Vector Machines” on page 24-17 “Gaussian Process Regression Models” on page 24-20 “Ensembles of Trees” on page 24-22

Choose Regression Model Type You can use the Regression Learner app to automatically train a selection of different models on your data. Use automated training to quickly try a selection of model types, and then explore promising models interactively. To get started, try these options first Get Started Regression Model Buttons

Description

All Quick-To-Train

Try the All Quick-To-Train button first. The app trains all model types that are typically quick to train.

All

Use the All button to train all available nonoptimizable model types. Trains every type regardless of any prior trained models. Can be time-consuming.

To learn more about automated model training, see “Automated Regression Model Training” on page 24-2. If you want to explore models one at a time, or if you already know what model type you want, you can select individual models or train a group of the same type. To see all available regression model options, on the Regression Learner tab, click the arrow in the Model Type section to expand the list of regression models. The nonoptimizable model options in the gallery are preset starting points with different settings, suitable for a range of different regression problems. To use optimizable model options and tune model hyperparameters automatically, see “Hyperparameter Optimization in Regression Learner App” on page 24-27. For help choosing the best model type for your problem, see the tables showing typical characteristics of different regression model types. Decide on the tradeoff you want in speed, flexibility, and interpretability. The best model type depends on your data.

24-12

Choose Regression Model Options

Tip To avoid overfitting, look for a less flexible model that provides sufficient accuracy. For example, look for simple models such as regression trees that are fast and easy to interpret. If the models are not accurate enough predicting the response, choose other models with higher flexibility, such as ensembles. To control flexibility, see the details for each model type. Characteristics of Regression Model Types Regression Model Type

Interpretability

“Linear Regression Models” on page 24-14

Easy

“Regression Trees” on page 24-15

Easy

“Support Vector Machines” on page 24-17

Easy for linear SVMs. Hard for other kernels.

“Gaussian Process Regression Models” on page 24-20

Hard

“Ensembles of Trees” on page 24-22

Hard

To read a description of each model in Regression Learner, switch to the details view in the list of all model presets.

Tip The nonoptimizable models in the Model Type gallery are preset starting points with different settings. After you choose a model type, such as regression trees, try training all the nonoptimizable presets to see which one produces the best model with your data. For workflow instructions, see “Train Regression Models in Regression Learner App” on page 24-2.

24-13

24

Regression Learner

Categorical Predictor Support In Regression Learner, all model types support categorical predictors. Tip If you have categorical predictors with many unique values, training linear models with interaction or quadratic terms and stepwise linear models can use a lot of memory. If the model fails to train, try removing these categorical predictors.

Linear Regression Models Linear regression models have predictors that are linear in the model parameters, are easy to interpret, and are fast for making predictions. These characteristics make linear regression models popular models to try first. However, the highly constrained form of these models means that they often have low predictive accuracy. After fitting a linear regression model, try creating more flexible models, such as regression trees, and compare the results.

Tip In the Model Type gallery, click All Linear to try each of the linear regression options and see which settings produce the best model with your data. Select the best model in the History list and try to improve that model by using feature selection and changing some advanced options. Regression Model Type

Interpretability

Model Flexibility

Linear

Easy

Very low

Interactions Linear

Easy

Medium

Robust Linear

Easy

Very low. Less sensitive to outliers, but can be slow to train.

Stepwise Linear

Easy

Medium

Tip For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 24-54. Advanced Linear Regression Options Regression Learner uses the fitlm function to train Linear, Interactions Linear, and Robust Linear models. The app uses the stepwiselm function to train Stepwise Linear models. For Linear, Interactions Linear, and Robust Linear models you can set these options: • Terms Specify which terms to use in the linear model. You can choose from: 24-14

Choose Regression Model Options

• Linear. A constant term and linear terms in the predictors • Interactions. A constant term, linear terms, and interaction terms between the predictors • Pure Quadratic. A constant term, linear terms, and terms that are purely quadratic in each of the predictors • Quadratic. A constant term, linear terms, and quadratic terms (including interactions) • Robust option Specify whether to use a robust objective function and make your model less sensitive to outliers. With this option, the fitting method automatically assigns lower weights to data points that are more likely to be outliers. Stepwise linear regression starts with an initial model and systematically adds and removes terms to the model based on the explanatory power of these incrementally larger and smaller models. For Stepwise Linear models, you can set these options: • Initial terms Specify the terms that are included in the initial model of the stepwise procedure. You can choose from Constant, Linear, Interactions, Pure Quadratic, and Quadratic. • Upper bound on terms Specify the highest order of the terms that the stepwise procedure can add to the model. You can choose from Linear, Interactions, Pure Quadratic, and Quadratic. • Maximum number of steps Specify the maximum number of different linear models that can be tried in the stepwise procedure. To speed up training, try reducing the maximum number of steps. Selecting a small maximum number of steps decreases your chances of finding a good model. Tip If you have categorical predictors with many unique values, training linear models with interaction or quadratic terms and stepwise linear models can use a lot of memory. If the model fails to train, try removing these categorical predictors.

Regression Trees Regression trees are easy to interpret, fast for fitting and prediction, and low on memory usage. Try to grow smaller trees with fewer larger leaves to prevent overfitting. Control the leaf size with the Minimum leaf size setting.

Tip In the Model Type gallery, click All Trees to try each of the nonoptimizable regression tree options and see which settings produce the best model with your data. Select the best model in the History list, and try to improve that model by using feature selection and changing some advanced options.

24-15

24

Regression Learner

Regression Model Type Interpretability

Model Flexibility

Fine Tree

Easy

High Many small leaves for a highly flexible response function (Minimum leaf size is 4.)

Medium Tree

Easy

Medium Medium-sized leaves for a less flexible response function (Minimum leaf size is 12.)

Coarse Tree

Easy

Low Few large leaves for a coarse response function (Minimum leaf size is 36.)

To predict a response of a regression tree, follow the tree from the root (beginning) node down to a leaf node. The leaf node contains the value of the response. Statistics and Machine Learning Toolbox trees are binary. Each step in a prediction involves checking the value of one predictor variable. For example, here is a simple regression tree

This tree predicts the response based on two predictors, x1 and x2. To make a prediction, start at the top node. At each node, check the values of the predictors to decide which branch to follow. When the branches reach a leaf node, the response is set to the value corresponding to that node. You can visualize your regression tree model by exporting the model from the app, and then entering: view(trainedModel.RegressionTree,'Mode','graph')

Tip For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 24-54. 24-16

Choose Regression Model Options

Advanced Regression Tree Options The Regression Learner app uses the fitrtree function to train regression trees. You can set these options: • Minimum leaf size Specify the minimum number of training samples used to calculate the response of each leaf node. When you grow a regression tree, consider its simplicity and predictive power. To change the minimum leaf size, click the buttons or enter a positive integer value in the Minimum leaf size box. • A fine tree with many small leaves is usually highly accurate on the training data. However, the tree might not show comparable accuracy on an independent test set. A very leafy tree tends to overfit, and its validation accuracy is often far lower than its training (or resubstitution) accuracy. • In contrast, a coarse tree with fewer large leaves does not attain high training accuracy. But a coarse tree can be more robust in that its training accuracy can be near that of a representative test set. Tip Decrease the Minimum leaf size to create a more flexible model. • Surrogate decision splits — For missing data only. Specify surrogate use for decision splits. If you have data with missing values, use surrogate splits to improve the accuracy of predictions. When you set Surrogate decision splits to On, the regression tree finds at most 10 surrogate splits at each branch node. To change the number of surrogate splits, click the buttons or enter a positive integer value in the Maximum surrogates per node box. When you set Surrogate decision splits to Find All, the regression tree finds all surrogate splits at each branch node. The Find All setting can use considerable time and memory. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-27.

Support Vector Machines You can train regression support vector machines (SVMs) in Regression Learner. Linear SVMs are easy to interpret, but can have low predictive accuracy. Nonlinear SVMs are more difficult to interpret, but can be more accurate.

Tip In the Model Type gallery, click All SVMs to try each of the nonoptimizable SVM options and see which settings produce the best model with your data. Select the best model in the History list, and try to improve that model by using feature selection and changing some advanced options.

24-17

24

Regression Learner

Regression Model Type

Interpretability

Model Flexibility

Linear SVM

Easy

Low

Quadratic SVM

Hard

Medium

Cubic SVM

Hard

Medium

Fine Gaussian SVM Hard

High Allows rapid variations in the response function. Kernel scale is set to sqrt(P)/4, where P is the number of predictors.

Medium Gaussian SVM

Hard

Medium Gives a less flexible response function. Kernel scale is set to sqrt(P).

Coarse Gaussian SVM

Hard

Low Gives a rigid response function. Kernel scale is set to sqrt(P)*4.

Statistics and Machine Learning Toolbox implements linear epsilon-insensitive SVM regression. This SVM ignores prediction errors that are less than some fixed number ε. The support vectors are the data points that have errors larger than ε. The function the SVM uses to predict new values depends only on the support vectors. To learn more about SVM regression, see Understanding Support Vector Machine Regression on page 25-2. Tip For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 24-54. Advanced SVM Options Regression Learner uses the fitrsvm function to train SVM regression models. You can set these options in the app: • Kernel function The kernel function determines the nonlinear transformation applied to the data before the SVM is trained. You can choose from: • Gaussian or Radial Basis Function (RBF) kernel • Linear kernel, easiest to interpret • Quadratic kernel • Cubic kernel 24-18

Choose Regression Model Options

• Box constraint mode The box constraint controls the penalty imposed on observations with large residuals. A larger box constraint gives a more flexible model. A smaller value gives a more rigid model, less sensitive to overfitting. When Box constraint mode is set to Auto, the app uses a heuristic procedure to select the box constraint. Try to fine-tune your model by specifying the box constraint manually. Set Box constraint mode to Manual and specify a value. Change the value by clicking the buttons or entering a positive scalar value in the Manual box constraint box. The app automatically preselects a reasonable value for you. Try to increase or decrease this value slightly and see if this improves your model. Tip Increase the box constraint value to create a more flexible model. • Epsilon mode Prediction errors that are smaller than the epsilon (ε) value are ignored and treated as equal to zero. A smaller epsilon value gives a more flexible model. When Epsilon mode is set to Auto, the app uses a heuristic procedure to select the kernel scale. Try to fine-tune your model by specifying the epsilon value manually. Set Epsilon mode to Manual and specify a value. Change the value by clicking the buttons or entering a positive scalar value in the Manual epsilon box. The app automatically preselects a reasonable value for you. Try to increase or decrease this value slightly and see if this improves your model. Tip Decrease the epsilon value to create a more flexible model. • Kernel scale mode The kernel scale controls the scale of the predictors on which the kernel varies significantly. A smaller kernel scale gives a more flexible model. When Kernel scale mode is set to Auto, the app uses a heuristic procedure to select the kernel scale. Try to fine-tune your model by specifying the kernel scale manually. Set Kernel scale mode to Manual and specify a value. Change the value by clicking the buttons or entering a positive scalar value in the Manual kernel scale box. The app automatically preselects a reasonable value for you. Try to increase or decrease this value slightly and see if this improves your model. Tip Decrease the kernel scale value to create a more flexible model. • Standardize Standardizing the predictors transforms them so that they have mean 0 and standard deviation 1. Standardizing removes the dependence on arbitrary scales in the predictors and generally improves performance. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-27. 24-19

24

Regression Learner

Gaussian Process Regression Models You can train Gaussian process regression (GPR) models in Regression Learner. GPR models are often highly accurate, but can be difficult to interpret.

Tip In the Model Type gallery, click All GPR Models to try each of the nonoptimizable GPR model options and see which settings produce the best model with your data. Select the best model in the History list, and try to improve that model by using feature selection and changing some advanced options. Regression Model Type

Interpretability

Model Flexibility

Rational Quadratic

Hard

Automatic

Squared Exponential

Hard

Automatic

Matern 5/2

Hard

Automatic

Exponential

Hard

Automatic

In Gaussian process regression, the response is modeled using a probability distribution over a space of functions. The flexibility of the presets in the Model Type gallery is automatically chosen to give a small training error and, simultaneously, protection against overfitting. To learn more about Gaussian process regression, see Gaussian Process Regression Models on page 6-2. Tip For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 24-54. Advanced Gaussian Process Regression Options Regression Learner uses the fitrgp function to train GPR models. You can set these options in the app: • Basis function The basis function specifies the form of the prior mean function of the Gaussian process regression model. You can choose from Zero, Constant, and Linear. Try to choose a different basis function and see if this improves your model. • Kernel function

24-20

Choose Regression Model Options

The kernel function determines the correlation in the response as a function of the distance between the predictor values. You can choose from Rational Quadratic, Squared Exponential, Matern 5/2, Matern 3/2, and Exponential. To learn more about kernel functions, see Kernel (Covariance) Function Options on page 6-5. • Use isotropic kernel If you use an isotropic kernel, the correlation length scales are the same for all the predictors. With a nonisotropic kernel, each predictor variable has its own separate correlation length scale. Using a nonisotropic kernel can improve the accuracy of your model, but can make the model slow to fit. To learn more about nonisotropic kernels, see Kernel (Covariance) Function Options. on page 6-5 • Kernel mode You can manually specify initial values of the kernel parameters Kernel scale and Signal standard deviation. The signal standard deviation is the prior standard deviation of the response values. By default the app locally optimizes the kernel parameters starting from the initial values. To use fixed kernel parameters, clear the Optimize numeric parameters check box in the advanced options. When Kernel scale mode is set to Auto, the app uses a heuristic procedure to select the initial kernel parameters. If you set Kernel scale mode to Manual, you can specify the initial values. Click the buttons or enter a positive scalar value in the Kernel scale box and the Signal standard deviation box. If you clear the Use isotropic kernel check box, you cannot set initial kernel parameters manually. • Sigma mode You can specify manually the initial value of the observation noise standard deviation Sigma. By default the app optimizes the observation noise standard deviation, starting from the initial value. To use fixed kernel parameters, clear the Optimize numeric parameters check box in the advanced options. When Sigma mode is set to Auto, the app uses a heuristic procedure to select the initial observation noise standard deviation. If you set Sigma mode to Manual, you can specify the initial values. Click the buttons or enter a positive scalar value in the Sigma box. • Standardize Standardizing the predictors transforms them so that they have mean 0 and standard deviation 1. Standardizing removes the dependence on arbitrary scales in the predictors and generally improves performance. • Optimize numeric parameters With this option, the app automatically optimizes numeric parameters of the GPR model. The optimized parameters are the coefficients of the Basis function, the kernel parameters Kernel scale and Signal standard deviation, and the observation noise standard deviation Sigma. 24-21

24

Regression Learner

Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-27.

Ensembles of Trees You can train ensembles of regression trees in Regression Learner. Ensemble models combine results from many weak learners into one high-quality ensemble model.

Tip In the Model Type gallery, click All Ensembles to try each of the nonoptimizable ensemble options and see which settings produce the best model with your data. Select the best model in the History list, and try to improve that model by using feature selection and changing some advanced options. Regression Model Type

Interpretabili Ensemble Method ty

Model Flexibility

Boosted Trees

Hard

Least-squares boosting (LSBoost) Medium to high with regression tree learners.

Bagged Trees

Hard

Bootstrap aggregating or bagging, High with regression tree learners.

Tip For a workflow example, see “Train Regression Trees Using Regression Learner App” on page 24-54. Advanced Ensemble Options Regression Learner uses the fitrensemble function to train ensemble models. You can set these options: • Minimum leaf size Specify the minimum number of training samples used to calculate the response of each leaf node. When you grow a regression tree, consider its simplicity and predictive power. To change the minimum leaf size, click the buttons or enter a positive integer value in the Minimum leaf size box. • A fine tree with many small leaves is usually highly accurate on the training data. However, the tree might not show comparable accuracy on an independent test set. A very leafy tree tends to overfit, and its validation accuracy is often far lower than its training (or resubstitution) accuracy. • In contrast, a coarse tree with fewer large leaves does not attain high training accuracy. But a coarse tree can be more robust in that its training accuracy can be near that of a representative test set. 24-22

Choose Regression Model Options

Tip Decrease the Minimum leaf size to create a more flexible model. • Number of learners Try changing the number of learners to see if you can improve the model. Many learners can produce high accuracy, but can be time consuming to fit. Tip Increase the Number of learners to create a more flexible model. • Learning rate For boosted trees, specify the learning rate for shrinkage. If you set the learning rate to less than 1, the ensemble requires more learning iterations but often achieves better accuracy. 0.1 is a popular initial choice. Alternatively, you can let the app choose some of these model options automatically by using hyperparameter optimization. See “Hyperparameter Optimization in Regression Learner App” on page 24-27.

See Also Related Examples •

“Train Regression Models in Regression Learner App” on page 24-2



“Select Data and Validation for Regression Problem” on page 24-8



“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-24



“Assess Model Performance in Regression Learner” on page 24-39



“Export Regression Model to Predict New Data” on page 24-50



“Train Regression Trees Using Regression Learner App” on page 24-54

24-23

24

Regression Learner

Feature Selection and Feature Transformation Using Regression Learner App In this section... “Investigate Features in the Response Plot” on page 24-24 “Select Features to Include” on page 24-25 “Transform Features with PCA in Regression Learner” on page 24-26

Investigate Features in the Response Plot In Regression Learner, use the response plot to try to identify predictors that are useful for predicting the response. To visualize the relation between different predictors and the response, under X-axis, select different variables in the X list. Before you train a regression model, the response plot shows the training data. If you have trained a regression model, then the response plot also shows the model predictions. Observe which variables are associated most clearly with the response. When you plot the carbig data set, the predictor Horsepower shows a clear negative association with the response. Look for features that do not seem to have any association with the response and use Feature

Selection

24-24

to remove those features from the set of used predictors.

Feature Selection and Feature Transformation Using Regression Learner App

You can export the response plots you create in the app to figures. See “Export Plots in Regression Learner App” on page 24-46.

Select Features to Include In Regression Learner, you can specify different features (or predictors) to include in the model. See if you can improve models by removing features with low predictive power. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily with fewer predictors. 1

On the Regression Learner tab, in the Features section, click Feature Selection 2

.

In the Feature Selection window, clear the check boxes for the predictors you want to exclude. Tip You can close the Feature Selection window, or move it. The choices you make in the window remain.

3

Click Train to train a new model using the new predictor options.

4

Observe the new model in the History list. The Current Model window displays how many predictors are excluded.

5

Check which predictors are included in a trained model. Click the model in the History list and look at the check boxes in the Feature Selection window.

6

Try to improve the model by including different features. 24-25

24

Regression Learner

For an example using feature selection, see “Train Regression Trees Using Regression Learner App” on page 24-54.

Transform Features with PCA in Regression Learner Use principal component analysis (PCA) to reduce the dimensionality of the predictor space. Reducing the dimensionality can create regression models in Regression Learner that help prevent overfitting. PCA linearly transforms predictors to remove redundant dimensions, and generates a new set of variables called principal components. 1

On the Regression Learner tab, in the Features section, select PCA.

2

In the Advanced PCA Options window, select the Enable PCA check box. You can close the PCA window, or move it. The choices you make in the window remain.

3

Click Train again. The pca function transforms your selected features before training the model. By default, PCA keeps only the components that explain 95% of the variance. In the PCA window, you can change the percentage of variance to explain in the Explained variance box. A higher value risks overfitting, while a lower value risks removing useful dimensions.

4

Manually limit the number of PCA components. In the Component reduction criterion list, select Specify number of components. Edit the number in the Number of numeric components box. The number of components cannot be larger than the number of numeric predictors. PCA is not applied to categorical predictors.

You can check PCA Options for trained models in the Current Model window. For example: PCA is keeping enough components to explain 95% variance. After training, 2 components were kept. Explained variance per component (in order): 92.5%, 5.3%, 1.7%, 0.5%

Check the explained variance percentages to decide whether to change the number of components. To learn more about how Regression Learner applies PCA to your data, generate code for your trained regression model. For more information on PCA, see the pca function.

See Also Related Examples

24-26



“Train Regression Models in Regression Learner App” on page 24-2



“Select Data and Validation for Regression Problem” on page 24-8



“Choose Regression Model Options” on page 24-12



“Assess Model Performance in Regression Learner” on page 24-39



“Export Plots in Regression Learner App” on page 24-46



“Export Regression Model to Predict New Data” on page 24-50



“Train Regression Trees Using Regression Learner App” on page 24-54

Hyperparameter Optimization in Regression Learner App

Hyperparameter Optimization in Regression Learner App In this section... “Select Hyperparameters to Optimize” on page 24-27 “Optimization Options” on page 24-33 “Minimum MSE Plot” on page 24-34 “Optimization Results” on page 24-36 After you choose a particular type of model to train, for example a decision tree or a support vector machine (SVM), you can tune your model by selecting different advanced options. For example, you can change the minimum leaf size of a decision tree or the box constraint of an SVM. Some of these options are internal parameters of the model, or hyperparameters, that can strongly affect its performance. Instead of manually selecting these options, you can use hyperparameter optimization within the Regression Learner app to automate the selection of hyperparameter values. For a given model type, the app tries different combinations of hyperparameter values by using an optimization scheme that seeks to minimize the model mean squared error (MSE), and returns a model with the optimized hyperparameters. You can use the resulting model as you would any other trained model. Note Because hyperparameter optimization can lead to an overfitted model, the recommended approach is to create a separate test set before importing your data into the Regression Learner app. After you train your optimizable model, you can export it from the app and see how it performs on your test set. For an example, see “Train Regression Model Using Hyperparameter Optimization in Regression Learner App” on page 24-63. To perform hyperparameter optimization in Regression Learner, follow these steps: 1

Choose a model type and decide which hyperparameters to optimize. See “Select Hyperparameters to Optimize” on page 24-27. Note Hyperparameter optimization is not supported for linear regression models.

2

(Optional) Specify how the optimization is performed. For more information, see “Optimization Options” on page 24-33.

3

Train your model. Use the “Minimum MSE Plot” on page 24-34 to track the optimization results.

4

Inspect your trained model. See “Optimization Results” on page 24-36.

Select Hyperparameters to Optimize In the Regression Learner app, in the Model Type section of the Regression Learner tab, click the arrow to open the gallery. The gallery includes optimizable models that you can train using hyperparameter optimization. After you select an optimizable model, you can choose which of its hyperparameters you want to optimize. In the Model Type section, select Advanced > Advanced. The app opens a dialog box in which you can select Optimize check boxes for the hyperparameters that you want to optimize. Under Values, specify the fixed values for the hyperparameters that you do not want to optimize or that are not optimizable. 24-27

24

Regression Learner

This table describes the hyperparameters that you can optimize for each type of model and the search range of each hyperparameter. It also includes the additional hyperparameters for which you can specify fixed values.

24-28

Model

Optimizable Hyperparameters

Additional Hyperparameters

Optimizable Tree

• Minimum leaf size • Surrogate decision – The software splits searches among • Maximum integers log-scaled surrogates per in the range node [1,max(2,floor(n /2))], where n is the number of observations.

Notes For more information, see “Advanced Regression Tree Options” on page 24-17.

Hyperparameter Optimization in Regression Learner App

Model

Optimizable Hyperparameters

Optimizable SVM

• Kernel function – The software searches among Gaussian, Linear, Quadratic, and Cubic. • Box constraint – The software searches among positive values logscaled in the range [0.001,1000]. • Kernel scale – The software searches among positive values log-scaled in the range [0.001,1000]. • Epsilon – The software searches among positive values log-scaled in the range [0.001,100]*iqr( Y)/1.349, where Y is the response variable. • Standardize data – The software searches between true and false.

Additional Hyperparameters

Notes • The Box constraint optimizable hyperparameter combines the Box constraint mode and Manual box constraint advanced options of the preset SVM models. • The Kernel scale optimizable hyperparameter combines the Kernel scale mode and Manual kernel scale advanced options of the preset SVM models. • You can optimize the Kernel scale optimizable hyperparameter only when the Kernel function value is Gaussian. Unless you specify a value for Kernel scale by clearing the Optimize check box, the app uses the Manual value of 1 by default when the Kernel function has a value other than Gaussian. • The Epsilon optimizable hyperparameter combines the Epsilon mode and Manual epsilon advanced options of the preset SVM models.

24-29

24

Regression Learner

Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes For more information, see “Advanced SVM Options” on page 24-18.

24-30

Hyperparameter Optimization in Regression Learner App

Model

Optimizable Hyperparameters

Additional Hyperparameters

Notes

Optimizable GPR

• Basis function – The software searches among Zero, Constant, and Linear.

• Signal standard deviation

• The Kernel function optimizable hyperparameter combines the Kernel function and Use isotropic kernel advanced options of the preset Gaussian process models.

• Kernel function – The software searches among: • Nonisotropic Rational Quadratic • Isotropic Rational Quadratic • Nonisotropic Squared Exponential • Isotropic Squared Exponential • Nonisotropic Matern 5/2 • Isotropic Matern 5/2 • Nonisotropic Matern 3/2 • Isotropic Matern 3/2 • Nonisotropic Exponential • Isotropic Exponential • Kernel scale – The software searches among real values in the range [0.001,1]*XMaxRa nge, where XMaxRange = max(max(X) – min(X)) and X is the predictor data. • Sigma – The software searches among real values in

• Optimize numeric parameters

• The Kernel scale optimizable hyperparameter combines the Kernel mode and Kernel scale advanced options of the preset Gaussian process models. • The Sigma optimizable hyperparameter combines the Sigma mode and Sigma advanced options of the preset Gaussian process models. • When you optimize the Kernel scale of isotropic kernel functions, only the kernel scale is optimized, not the signal standard deviation. You can either specify a Signal standard deviation value or use its default value. You cannot optimize the Kernel scale of nonisotropic kernel functions. For more information, see “Advanced Gaussian

24-31

24

Regression Learner

Model

Optimizable Additional Hyperparameters Hyperparameters the range [0.0001,max(0.00 1,10*std(Y))], where Y is the response variable.

Notes Process Regression Options” on page 24-20.

• Standardize – The software searches between true and false. Optimizable Ensemble

• Ensemble method – The software searches among Bag and LSBoost. • Minimum leaf size – The software searches among integers log-scaled in the range [1,max(2,floor(n /2))], where n is the number of observations. • Number of learners – The software searches among integers logscaled in the range [10,500]. • Learning rate – The software searches among real values log-scaled in the range [0.001,1]. • Number of predictors to sample – The software searches among integers in the range [1,max(2,p)], where p is the number of predictor variables.

24-32

• The Bag value of the Ensemble method optimizable hyperparameter specifies a Bagged Trees model. Similarly, the LSBoost Ensemble method value specifies a Boosted Trees model. • The Number of predictors to sample optimizable hyperparameter is not available in the advanced options of the preset ensemble models. For more information, see “Advanced Ensemble Options” on page 24-22.

Hyperparameter Optimization in Regression Learner App

Optimization Options By default, the Regression Learner app performs hyperparameter tuning by using Bayesian optimization. The goal of Bayesian optimization, and optimization in general, is to find a point that minimizes an objective function. In the context of hyperparameter tuning in the app, a point is a set of hyperparameter values, and the objective function is the loss function, or the mean squared error (MSE). For more information on the basics of Bayesian optimization, see “Bayesian Optimization Workflow” on page 10-25. You can specify how the hyperparameter tuning is performed. For example, you can change the optimization method to grid search or limit the training time. On the Regression Learner tab, in the Model Type section, select Advanced > Optimizer Options. The app opens a dialog box in which you can select optimization options. This table describes the available optimization options and their default values. Option

Description

Optimizer

The optimizer values are: • Bayesopt (default) – Use Bayesian optimization. Internally, the app calls the bayesopt function. • Grid search – Use grid search with the number of values per dimension determined by the Number of grid divisions value. The app searches in a random order, using uniform sampling without replacement from the grid. • Random search – Search at random among points, where the number of points corresponds to the Iterations value.

Acquisition function

When the app performs Bayesian optimization for hyperparameter tuning, it uses the acquisition function to determine the next set of hyperparameter values to try. The acquisition function values are: • Expected improvement per second plus (default) • Expected improvement • Expected improvement plus • Expected improvement per second • Lower confidence bound • Probability of improvement For details on how these acquisition functions work in the context of Bayesian optimization, see “Acquisition Function Types” on page 10-3.

24-33

24

Regression Learner

Option

Description

Iterations

Each iteration corresponds to a combination of hyperparameter values that the app tries. When you use Bayesian optimization or random search, specify a positive integer that sets the number of iterations. The default value is 30. When you use grid search, the app ignores the Iterations value and evaluates the loss at every point in the entire grid. You can set a training time limit to stop the optimization process prematurely.

Training time limit

To set a training time limit, select this option and set the Maximum training time in seconds option. By default, the app does not have a training time limit.

Maximum training time in seconds

Set the training time limit in seconds as a positive real number. The default value is 300. The run time can exceed the training time limit because this limit does not interrupt an iteration evaluation.

Number of grid divisions

When you use grid search, set a positive integer as the number of values the app tries for each numeric hyperparameter. The app ignores this value for categorical hyperparameters. The default value is 10.

Minimum MSE Plot After specifying which model hyperparameters to optimize and setting any additional optimization options (optional), train your optimizable model. On the Regression Learner tab, in the Training section, click Train. The app creates a Minimum MSE Plot that it updates as the optimization runs.

24-34

Hyperparameter Optimization in Regression Learner App

Note When you train an optimizable model, the app disables the Use Parallel button. After the training is complete, the app makes the button available again when you select a nonoptimizable model. The button is off by default. The minimum mean squared error (MSE) plot displays the following information: • Estimated minimum MSE – Each light blue point corresponds to an estimate of the minimum MSE computed by the optimization process when considering all the sets of hyperparameter values tried so far, including the current iteration.

24-35

24

Regression Learner

The estimate is based on an upper confidence interval of the current MSE objective model, as mentioned in the Bestpoint hyperparameters description. If you use grid search or random search to perform hyperparameter optimization, the app does not display these light blue points. • Observed minimum MSE – Each dark blue point corresponds to the observed minimum MSE computed so far by the optimization process. For example, at the third iteration, the blue point corresponds to the minimum of the MSE observed in the first, second, and third iterations. • Bestpoint hyperparameters – The red square indicates the iteration that corresponds to the optimized hyperparameters. You can find the values of the optimized hyperparameters listed in the upper right of the plot under Optimization Results. The optimized hyperparameters do not always provide the observed minimum MSE. When the app performs hyperparameter tuning by using Bayesian optimization (see “Optimization Options” on page 24-33 for a brief introduction), it chooses the set of hyperparameter values that minimizes an upper confidence interval of the MSE objective model, rather than the set that minimizes the MSE. For more information, see the 'Criterion','min-visited-upper-confidenceinterval' name-value pair argument of bestPoint. • Minimum error hyperparameters – The yellow point indicates the iteration that corresponds to the hyperparameters that yield the observed minimum MSE. For more information, see the 'Criterion','min-observed' name-value pair argument of bestPoint. If you use grid search to perform hyperparameter optimization, the Bestpoint hyperparameters and the Minimum error hyperparameters are the same. Missing points in the plot correspond to NaN minimum MSE values.

Optimization Results When the app finishes tuning model hyperparameters, it returns a model trained with the optimized hyperparameter values (Bestpoint hyperparameters). The model metrics, displayed plots, and exported model correspond to this trained model with fixed hyperparameter values. To inspect the optimization results of a trained optimizable model, select the model in the History list and look at the Current Model pane.

24-36

Hyperparameter Optimization in Regression Learner App

The Current Model pane includes these sections: • Results – Shows the performance of the optimizable model. See “View Model Statistics in Current Model Window” on page 24-40 • Model Type – Displays the type of optimizable model and lists any fixed hyperparameter values • Optimized Hyperparameters – Lists the values of the optimized hyperparameters • Hyperparameter Search Range – Displays the search ranges for the optimized hyperparameters • Optimizer Options – Shows the selected optimizer options When you perform hyperparameter tuning using Bayesian optimization and you export a trained optimizable model to the workspace as a structure, the structure includes a

24-37

24

Regression Learner

BayesianOptimization object in the HyperParameterOptimizationResult field. The object contains the results of the optimization performed in the app. When you generate MATLAB code from a trained optimizable model, the generated code uses the fixed and optimized hyperparameter values of the model to train on new data. The generated code does not include the optimization process. For information on how to perform Bayesian optimization when you use a fit function, see “Bayesian Optimization Using a Fit Function” on page 10-26.

See Also Related Examples

24-38



“Train Regression Model Using Hyperparameter Optimization in Regression Learner App” on page 24-63



“Bayesian Optimization Workflow” on page 10-25



“Train Regression Models in Regression Learner App” on page 24-2



“Select Data and Validation for Regression Problem” on page 24-8



“Choose Regression Model Options” on page 24-12



“Assess Model Performance in Regression Learner” on page 24-39



“Export Regression Model to Predict New Data” on page 24-50

Assess Model Performance in Regression Learner

Assess Model Performance in Regression Learner In this section... “Check Performance in History List” on page 24-39 “View Model Statistics in Current Model Window” on page 24-40 “Explore Data and Results in Response Plot” on page 24-40 “Plot Predicted vs. Actual Response” on page 24-42 “Evaluate Model Using Residuals Plot” on page 24-43 After training regression models in Regression Learner, you can compare models based on model statistics, visualize results in response plot, or by plotting actual versus predicted response, and evaluate models using the residual plot. • If you use k-fold cross-validation, then the app computes the model statistics using the observations in the k validation folds and reports the average values. It makes predictions on the observations in the validation folds and the plots show these predictions. It also computes the residuals on the observations in the validation folds. Note When you import data into the app, if you accept the defaults, the app automatically uses cross-validation. To learn more, see “Choose Validation Scheme” on page 23-18. • If you use holdout validation, the app computes the accuracy scores using the observations in the validation fold and makes predictions on these observations. The app uses these predictions in the plots and also computes the residuals based on the predictions. • If you choose not to use a validation scheme, the score is the resubstitution accuracy based on all the training data, and the predictions are resubstitution predictions.

Check Performance in History List After training a model in Regression Learner, check the History list to see which model has the best overall score. The best score is highlighted in a box. This score is the root mean square error (RMSE) on the validation set. The score estimates the performance of the trained model on new data. Use the score to help you choose the best model.

• For cross-validation, the score is the RMSE on all observations, counting each observation when it was in a held-out fold. • For holdout validation, the score is the RMSE on the held-out observations. • For no validation, the score is the resubstitution RMSE on all the training data. 24-39

24

Regression Learner

The best overall score might not be the best model for your goal. Sometimes a model with slightly lower overall score is the better model for your goal. You want to avoid overfitting, and you might want to exclude some predictors where data collection is expensive or difficult.

View Model Statistics in Current Model Window You can view model statistics in the Current Model window and use these statistics to assess and compare models. The statistics are calculated on the validation set.

Model Statistics Statistic

Description

Tip

RMSE

Root mean square error. The RMSE is always positive and its Look for smaller values of units match the units of your response. the RMSE.

R-Squared Coefficient of determination. R-squared is always smaller Look for an R-Squared than 1 and usually larger than 0. It compares the trained close to 1. model with the model where the response is constant and equals the mean of the training response. If your model is worse than this constant model, then R-Squared is negative. MSE

Mean squared error. The MSE is the square of the RMSE.

Look for smaller values of the MSE.

MAE

Mean absolute error. The MAE is always positive and similar Look for smaller values of to the RMSE, but less sensitive to outliers. the MAE.

Explore Data and Results in Response Plot In the response plot, view the regression model results. After you train a regression model, the response plot displays the predicted response versus record number. If you are using holdout or cross-validation, then these predictions are the predictions on the held-out observations. In other words, each prediction is obtained using a model that was trained without using the corresponding observation. To investigate your results, use the controls on the right. You can: • Plot predicted and/or true responses. Use the check boxes under Plot to make your selection. • Show prediction errors, drawn as vertical lines between the predicted and true responses, by selecting the Errors check box. • Choose the variable to plot on the x-axis under X-axis. You can choose either the record number or one of your predictor variables. 24-40

Assess Model Performance in Regression Learner

• Plot the response as markers, or as a box plot under Style. You can only select Box plot when the variable on the x-axis has few unique values. A box plot displays the typical values of the response and any possible outliers. The central mark indicates the median, and the bottom and top edges of the box are the 25th and 75th percentiles, respectively. Vertical lines, called whiskers, extend from the boxes to the most extreme data points that are not considered outliers. The outliers are plotted individually using the '+' symbol. For more information about box plots, see boxplot.

24-41

24

Regression Learner

To export the response plots you create in the app to figures, see “Export Plots in Regression Learner App” on page 24-46.

Plot Predicted vs. Actual Response Use the Predicted vs. Actual plot to check model performance. Use this plot to understand how well the regression model makes predictions for different response values. To view the Predicted vs. Actual plot after training a model, on the Regression Learner tab, in the Plots section, click Predicted vs. Actual Plot

.

When you open the plot, the predicted response of your model is plotted against the actual, true response. A perfect regression model has a predicted response equal to the true response, so all the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A good model has small errors, and so the predictions are scattered near the line.

24-42

Assess Model Performance in Regression Learner

Usually a good model has points scattered roughly symmetrically around the diagonal line. If you can see any clear patterns in the plot, it is likely that you can improve your model. Try training a different model type or making your current model type more flexible using the Advanced options in the Model Type section. If you are unable to improve your model, it is possible that you need more data, or that you are missing an important predictor. To export the Predicted vs. Actual plots you create in the app to figures, see “Export Plots in Regression Learner App” on page 24-46.

Evaluate Model Using Residuals Plot Use the residuals plot to check model performance. To view the residuals plot after training a model, . The residuals plot on the Regression Learner tab, in the Plots section, click Residuals Plot displays the difference between the predicted and true responses. Choose the variable to plot on the x-axis under X-axis. Choose either the true response, predicted response, record number, or one of your predictors.

24-43

24

Regression Learner

Usually a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, it is likely that you can improve your model. Look for these patterns: • Residuals are not symmetrically distributed around 0. • Residuals change significantly in size from left to right in the plot. • Outliers occur, that is, residuals that are much larger than the rest of the residuals. • Clear, nonlinear pattern appears in the residuals. Try training a different model type, or making your current model type more flexible using the Advanced options in the Model Type section. If you are unable to improve your model, it is possible that you need more data, or that you are missing an important predictor. To export the residuals plots you create in the app to figures, see “Export Plots in Regression Learner App” on page 24-46.

See Also Related Examples

24-44



“Train Regression Models in Regression Learner App” on page 24-2



“Select Data and Validation for Regression Problem” on page 24-8

Assess Model Performance in Regression Learner



“Choose Regression Model Options” on page 24-12



“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-24



“Export Plots in Regression Learner App” on page 24-46



“Export Regression Model to Predict New Data” on page 24-50



“Train Regression Trees Using Regression Learner App” on page 24-54

24-45

24

Regression Learner

Export Plots in Regression Learner App After you create plots interactively in the Regression Learner app, you can export your app plots to MATLAB figures. You can then copy, save, or customize the new figures. Choose among the available plots: response plot on page 24-24, Predicted vs. Actual plot on page 24-42, residuals plot on page 2443, and minimum MSE plot on page 24-34. • Before exporting a plot, make sure the plot in the app displays the same data that you want in the new figure. • On the Regression Learner tab, in the Export section, click Export Plot to Figure. The app creates a figure from the selected plot. • The new figure might not have the same interactivity options as the plot in the Regression Learner app. • Additionally, the figure might have a different axes toolbar than the one in the app plot. For plots in Regression Learner, an axes toolbar appears above the top right of the plot. The buttons available on the toolbar depend on the contents of the plot. The toolbar can include buttons to export the plot as an image, add data tips, pan or zoom the data, and restore the view.

• Copy, save, or customize the new figure, which is displayed in the figure window. • To copy the figure, select Edit > Copy Figure. For more information, see “Copy Figure to Clipboard from Edit Menu” (MATLAB). • To save the figure, select File > Save As. Alternatively, you can follow the workflow described in “Customize Figure Before Saving” (MATLAB). •

To customize the figure, click the Edit Plot button on the figure toolbar. Right-click the section of the plot that you want to edit. You can change the listed properties, which might include Color, Font, Line Style, and other properties. Or, you can use the Property Inspector to change the figure properties.

As an example, export a response plot in the app to a figure, customize the figure, and save the modified figure. 1

In the MATLAB Command Window, load the carbig data set. load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG);

2

Click the Apps tab.

3

In the Apps section, click the arrow to open the gallery. Under Machine Learning and Deep Learning, click Regression Learner.

4

On the Regression Learner tab, in the File section, click

24-46

.

5

In the New Session dialog box, select the table cartable from the Workspace Variable list.

6

Click Start Session. Regression Learner creates a response plot of the data by default.

Export Plots in Regression Learner App

7

Change the x-axis data in the response plot to Weight.

8

On the Regression Learner tab, in the Export section, click Export Plot to Figure.

9

In the new figure, click the Edit Plot button on the figure toolbar. Right-click the points in the plot. In the context menu, select Marker > square.

24-47

24

Regression Learner

10 To save the figure, select File > Save As. Specify the saved file location, name, and type.

24-48

Export Plots in Regression Learner App

See Also Related Examples •

“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-24



“Assess Model Performance in Regression Learner” on page 24-39



“Export Regression Model to Predict New Data” on page 24-50

24-49

24

Regression Learner

Export Regression Model to Predict New Data In this section... “Export Model to Workspace” on page 24-50 “Make Predictions for New Data” on page 24-50 “Generate MATLAB Code to Train Model with New Data” on page 24-51 “Deploy Predictions Using MATLAB Compiler” on page 24-52

Export Model to Workspace After you create regression models interactively in the Regression Learner app, you can export your best model to the workspace. Then you can use that trained model to make predictions using new data. Note The final model Regression Learner exports is always trained using the full data set. The validation scheme that you use only affects the way that the app computes validation metrics. You can use the validation metrics and various plots that visualize results to pick the best model for your regression problem. Here are the steps for exporting a model to the MATLABworkspace: 1

In the app, select the model you want to export in the History list.

2

On the Regression Learner tab, in the Export section, click one of the export options: • To include the data used for training the model, select Export Model. You export the trained model to the workspace as a structure containing a regression model object. • To exclude the training data, select Export Compact Model. This option exports the model with unnecessary data removed where possible. For some models this is a compact object that does not include the training data, but you can still use it for making predictions on new data.

3

In the Export Model dialog box, check the name of your exported variable, and edit it if you want. Then, click OK. The default name for your exported model, trainedModel, increments every time you export to avoid overwriting your models (for example, trainedModel1). The new variable (for example, trainedModel) appears in your workspace. The app displays information about the exported model in the command window. Read the message to learn how to make predictions with new data.

Make Predictions for New Data After you export a model to the workspace from Regression Learner, or run the code generated from the app, you get a trainedModel structure that you can use to make predictions using new data. The structure contains a model object and a function for prediction. The structure enables you to make predictions for models that include principal component analysis (PCA). 1

24-50

Use the exported model to make predictions for new data, T:

Export Regression Model to Predict New Data

yfit = trainedModel.predictFcn(T)

where trainedModel is the name of your exported variable. Supply the data T with the same format and data type as the training data used in the app (table or matrix). • If you supply a table, then ensure that it contains the same predictor names as your training data. The predictFcn ignores additional variables in tables. Variable formats and types must match the original training data. • If you supply a matrix, it must contain the same predictor columns or rows as your training data, in the same order and format. Do not include a response variable, any variables that you did not import in the app, or other unused variables. The output yfit contains a prediction for each data point. 2

Examine the fields of the exported structure. For help making predictions, enter: trainedModel.HowToPredict

You also can extract the model object from the exported structure for further analysis. If you use feature transformation such as PCA in the app, you must take into account this transformation by using the information in the PCA fields of the structure.

Generate MATLAB Code to Train Model with New Data After you create regression models interactively in the Regression Learner app, you can generate MATLAB code for your best model. Then you can use the code to train the model with new data. Generate MATLAB code to: • Train on huge data sets. Explore models in the app trained on a subset of your data, and then generate code to train a selected model on a larger data set. • Create scripts for training models without needing to learn syntax of the different functions. • Examine the code to learn how to train models programmatically. • Modify the code for further analysis, for example to set options that you cannot change in the app. • Repeat your analysis on different data and automate training. To generate code and use it to train a model with new data: 1

In the app, from the History list, select the model you want to generate code for.

2

On the Regression Learner tab, in the Export section, click Generate Function. The app generates code from your session and displays the file in the MATLAB Editor. The file includes the predictors and response, the model training methods, and the validation methods. Save the file.

3

To retrain your model, call the function from the command line with your original data or new data as the input argument or arguments. New data must have the same shape as the original data. Copy the first line of the generated code, excluding the word function, and edit the trainingData input argument to reflect the variable name of your training data or new data. Similarly, edit the responseData input argument (if applicable). 24-51

24

Regression Learner

For example, to retrain a regression model trained with the cartable data set, enter: [trainedModel, validationRMSE] = trainRegressionModel(cartable)

The generated code returns a trainedModel structure that contains the same fields as the structure you create when you export a model from Regression Learner to the workspace. If you want to automate training the same model with new data, or learn how to programmatically train models, examine the generated code. The code shows you how to: • Process the data into the right shape. • Train a model and specify all the model options. • Perform cross-validation. • Compute statistics. • Compute validation predictions and scores. Note If you generate MATLAB code from a trained optimizable model, the generated code does not include the optimization process.

Deploy Predictions Using MATLAB Compiler After you export a model to the workspace from Regression Learner, you can deploy it using MATLAB Compiler. Suppose you export the trained model to MATLAB Workspace based on the instructions in “Export Model to Workspace” on page 24-50, with the name trainedModel. To deploy predictions, follow these steps. • Save the trainedModel structure in a .mat file. save mymodel trainedModel

• Write the code to be compiled. This code must load the trained model and use it to make a prediction. It must also have a pragma, so the compiler recognizes that Statistics and Machine Learning Toolbox code is needed in the compiled application. This pragma could be any function in the toolbox. function ypred = mypredict(tbl) %#function fitrtree load('mymodel.mat'); load('cartable.mat') ypred = trainedModel.predictFcn(cartable) end

• Compile as a standalone application. mcc -m mypredict.m

See Also Functions fitlm | fitrensemble | fitrgp | fitrsvm | fitrtree | stepwiselm 24-52

Export Regression Model to Predict New Data

Classes CompactLinearModel | CompactRegressionEnsemble | CompactRegressionGP | CompactRegressionSVM | CompactRegressionTree | LinearModel | RegressionEnsemble | RegressionGP | RegressionSVM | RegressionTree

Related Examples •

“Train Regression Models in Regression Learner App” on page 24-2



“Select Data and Validation for Regression Problem” on page 24-8



“Choose Regression Model Options” on page 24-12



“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-24



“Assess Model Performance in Regression Learner” on page 24-39



“Train Regression Trees Using Regression Learner App” on page 24-54

24-53

24

Regression Learner

Train Regression Trees Using Regression Learner App This example shows how to create and compare various regression trees using the Regression Learner app, and export trained models to the workspace to make predictions for new data. You can train regression trees to predict responses to given input data. To predict the response of a regression tree, follow the tree from the root (beginning) node down to a leaf node. At each node, decide which branch to follow using the rule associated to that node. Continue until you arrive at a leaf node. The predicted response is the value associated to that leaf node. Statistics and Machine Learning Toolbox trees are binary. Each step in a prediction involves checking the value of one predictor variable. For example, here is a simple regression tree:

This tree predicts the response based on two predictors, x1 and x2. To predict, start at the top node. At each node, check the values of the predictors to decide which branch to follow. When the branches reach a leaf node, the response is set to the value corresponding to that node. This example uses the carbig data set. This data set contains characteristics of different car models produced from 1970 through 1982, including: • Acceleration • Number of cylinders • Engine displacement • Engine power (Horsepower) • Model year • Country of origin • Miles per gallon (MPG) Train regression trees to predict the fuel economy in miles per gallon of a car model, given the other variables as inputs. 24-54

Train Regression Trees Using Regression Learner App

1

In MATLAB, load the carbig data set and create a table containing the different variables: load carbig cartable = table(Acceleration, Cylinders, Displacement,... Horsepower, Model_Year, Weight, Origin, MPG);

2

On the Apps tab, in the Machine Learning and Deep Learning group, click Regression Learner.

3

On the Regression Learner tab, in the File section, select New Session > From Workspace.

4

Under Data Set Variable in the New Session dialog box, select cartable from the list of tables and matrices in your workspace. Observe that the app has preselected response and predictor variables. MPG is chosen as the response, and all the other variables as predictors. For this example, do not change the selections.

5

To accept the default validation scheme and continue, click Start Session. The default validation option is cross-validation, to protect against overfitting. Regression Learner creates a plot of the response with the record number on the x-axis.

24-55

24

Regression Learner

6

Use the response plot to investigate which variables are useful for predicting the response. To visualize the relation between different predictors and the response, select different variables in the X list under X-axis. Observe which variables are correlated most clearly with the response. Displacement, Horsepower, and Weight all have a clearly visible impact on the response and all show a negative association with the response.

7

24-56

Select the variable Origin under X-axis. A box plot is automatically displayed. A box plot shows the typical values of the response and any possible outliers. The box plot is useful when plotting markers results in many points overlapping. To show a box plot when the variable on the x-axis has few unique values, under Style, select Box plot.

Train Regression Trees Using Regression Learner App

8

Create a selection of regression trees. On the Regression Learner tab, in the Model Type section, click All Trees

Then click Train

.

.

Tip If you have Parallel Computing Toolbox, then the first time you click Train you see a dialog box while the app opens a parallel pool of workers. After the pool opens, you can train multiple regression models simultaneously and continue working. Regression Learner creates and trains three regression trees: a Fine Tree, a Medium Tree, and a Coarse Tree. The three models appear in the History list. Check the validation RMSE (root mean square error) of the models. The best score is highlighted in a box. The Fine Tree and the Medium Tree have similar RMSEs, while the Coarse Tree is less accurate. Regression Learner plots both the true training response and the predicted response of the currently selected model.

24-57

24

Regression Learner

Note If you are using validation, there is some randomness in the results and so your model validation score can differ from the results shown. 9

Choose a model in the History list to view the results of that model. Under X-axis, select Horsepower and examine the response plot. Both the true and predicted responses are now plotted. Show the prediction errors, drawn as vertical lines between the predicted and true responses, by selecting the Errors check box.

10 See more details on the currently selected model in the Current Model window. Check and

compare additional model characteristics, such as R-squared (coefficient of determination), MAE (mean absolute error), and prediction speed. To learn more, see “View Model Statistics in Current Model Window” on page 24-40. In the Current Model window you also can find details on the currently selected model type, such as options used for training the model. 11 Plot the predicted response versus true response. On the Regression Learner tab, in the Plots

. Use this plot to understand how well the section, click Predicted vs. Actual Plot regression model makes predictions for different response values.

24-58

Train Regression Trees Using Regression Learner App

A perfect regression model has predicted response equal to true response, so all the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A good model has small errors, so the predictions are scattered near the line. Usually a good model has points scattered roughly symmetrically around the diagonal line. If you can see any clear patterns in the plot, it is likely that you can improve your model. 12 Select the other models in the History list and compare the predicted versus actual plots. 13 In the Model Type gallery, select All Trees again. To try to improve the model, try including

different features in the model. See if you can improve the model by removing features with low predictive power. On the Regression Learner tab, in the Features section, click Feature

Selection

.

In the Feature Selection window, clear the check boxes for Acceleration and Cylinders to exclude them from the predictors.

Click Train

to train new regression trees using the new predictor settings.

24-59

24

Regression Learner

14 Observe the new models in the History list. These models are the same regression trees as

before, but trained using only five of seven predictors. The History list displays how many predictors are used. To check which predictors are used, click a model in the History list and observe the check boxes in the Feature Selection window. The models with the two features removed perform comparably to the models using all predictors. The models predict no better using all the predictors compared to using only a subset of the predictors. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors. 15 Train the three regression tree presets using only Horsepower as predictor. Change the

selections in the Feature Selection window and click Train. Using only the engine power as predictor results in models with lower accuracy. However, the models perform well given that they are using only a single predictor. With this simple onedimensional predictor space, the coarse tree now performs as well as the medium and fine trees. 16 Select the best model in the History list and view the residuals plot. On the Regression Learner

. The residuals plot displays the difference tab, in the Plots section, click Residuals Plot between the predicted and true responses. To display the residuals as a line graph, in the Style section, choose Lines. Under X-axis, select the variable to plot on the x-axis. Choose either the true response, predicted response, record number, or one of your predictors.

24-60

Train Regression Trees Using Regression Learner App

Usually a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, it is likely that you can improve your model. 17 To learn about model settings, choose the best model in the History list and view the advanced

settings. The nonoptimizable model options in the Model Type gallery are preset starting points, and you can change additional settings. On the Regression Learner tab, in the Model Type section, click Advanced. Compare the different regression tree models in the History list, and observe the differences in the Advanced Regression Tree Options dialog box. The Minimum leaf size setting controls the size of the tree leaves, and through that the size and depth of the regression tree. To try to improve the model further, change the Minimum leaf size setting to 8, and then train a new model by clicking Train. View the settings for the selected trained model in the Current Model window or in the Advanced Regression Tree Options dialog box. To learn more about regression tree settings, see “Regression Trees” on page 24-15. 18 Export the selected model to the workspace. On the Regression Learner tab, in the Export

section, click Export Model. In the Export Model dialog box, click OK to accept the default variable name trainedModel. To see information about the results, look in the command window. 19 Use the exported model to make predictions on new data. For example, to make predictions for

the cartable data in your workspace, enter: yfit = trainedModel.predictFcn(cartable)

The output yfit contains the predicted response for each data point. 20 If you want to automate training the same model with new data or learn how to programmatically

train regression models, you can generate code from the app. To generate code for the best trained model, on the Regression Learner tab, in the Export section, click Generate Function. The app generates code from your model and displays the file in the MATLAB Editor. To learn more, see “Generate MATLAB Code to Train Model with New Data” on page 24-51. Tip Use the same workflow as in this example to evaluate and compare the other regression model types you can train in Regression Learner. Train all the nonoptimizable regression model presets available: 1 2

On the far right of the Model Type section, click the arrow to expand the list of regression models. Click All

, and then click Train.

24-61

24

Regression Learner

To learn about other regression model types, see “Train Regression Models in Regression Learner App” on page 24-2.

See Also Related Examples

24-62



“Train Regression Models in Regression Learner App” on page 24-2



“Select Data and Validation for Regression Problem” on page 24-8



“Choose Regression Model Options” on page 24-12



“Feature Selection and Feature Transformation Using Regression Learner App” on page 24-24



“Assess Model Performance in Regression Learner” on page 24-39



“Export Regression Model to Predict New Data” on page 24-50

Train Regression Model Using Hyperparameter Optimization in Regression Learner App

Train Regression Model Using Hyperparameter Optimization in Regression Learner App This example shows how to tune hyperparameters of a regression ensemble by using hyperparameter optimization in the Regression Learner app. Compare the test set performance of the trained optimizable ensemble to that of the best-performing preset ensemble model. 1

In the MATLAB Command Window, load the carbig data set, and create a table containing most of the variables. Separate the table into training and test sets. load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG); rng('default') % For reproducibility of the data split n = length(MPG); partition = cvpartition(n,'HoldOut',0.15); idxTrain = training(partition); % Indices for the training set cartableTrain = cartable(idxTrain,:); cartableTest = cartable(~idxTrain,:);

2

Open Regression Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.

3

On the Regression Learner tab, in the File section, select New Session > From Workspace.

4

In the New Session dialog box, select the cartableTrain table from the Data Set Variable list. As shown in the dialog box, the app selects the response and predictor variables. The default response variable is MPG. The default validation option is 5-fold cross-validation, to protect against overfitting. For this example, do not change the default settings.

24-63

24

Regression Learner

5

To accept the default options and continue, click Start Session.

6

Train all preset ensemble models. On the Regression Learner tab, in the Model Type section, click the arrow to open the gallery. In the Ensembles of Trees group, click All Ensembles. In the Training section, click Train. The app trains one of each ensemble model type and displays the models in the History list. Tip If you have Parallel Computing Toolbox, the Opening Pool dialog box opens the first time you click Train (or when you click Train again after an extended period of time). The dialog box remains open while the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can train multiple models simultaneously and continue working.

24-64

Train Regression Model Using Hyperparameter Optimization in Regression Learner App

Note Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example. 7

Select an optimizable ensemble model to train. On the Regression Learner tab, in the Model Type section, click the arrow to open the gallery. In the Ensembles of Trees group, click Optimizable Ensemble. The app disables the Use Parallel button when you select an optimizable model.

8

Select the model hyperparameters to optimize. In the Model Type section, select Advanced > Advanced. The app opens a dialog box in which you can select Optimize check boxes for the hyperparameters that you want to optimize. By default, all the check boxes are selected. For this example, accept the default selections, and close the dialog box.

24-65

24

Regression Learner

9

In the Training section, click Train.

10 The app displays a Minimum MSE Plot as it runs the optimization process. At each iteration,

the app tries a different combination of hyperparameter values and updates the plot with the minimum validation mean squared error (MSE) observed up to that iteration, indicated in dark blue. When the app completes the optimization process, it selects the set of optimized hyperparameters, indicated by a red square. For more information, see “Minimum MSE Plot” on page 24-34. The app lists the optimized hyperparameters in both the upper right of the plot and the Optimized Hyperparameters section of the Current Model pane.

24-66

Train Regression Model Using Hyperparameter Optimization in Regression Learner App

Note In general, the optimization results are not reproducible. 11 Compare the trained preset ensemble models to the trained optimizable model. In the History

list, the app highlights the lowest validation RMSE (root mean square error) by outlining it in a box. In this example, the trained optimizable ensemble outperforms the two preset models. A trained optimizable model does not always have a lower RMSE than the trained preset models. If a trained optimizable model does not perform well, you can try to get better results by running the optimization for longer. In the Model Type section, select Advanced > Optimizer Options. In the dialog box, increase the Iterations value. For example, you can double-click the default value of 30 and enter a value of 60. 12 Because hyperparameter tuning often leads to overfitted models, check the test set performance

of the ensemble model with the optimized hyperparameters and compare it to the performance of the best preset ensemble model. Begin by exporting the two models to the MATLAB workspace. • In the History list, select the Boosted Trees model. On the Regression Learner tab, in the Export section, select Export Model > Export Model. In the dialog box, name the model treeEnsemble.

24-67

24

Regression Learner

• In the History list, select the Optimizable Ensemble model. On the Regression Learner tab, in the Export section, select Export Model > Export Model. In the dialog box, name the model optimizableEnsemble. 13 Compute the RMSE of the two models on the cartableTest data. In the MATLAB Command

Window, use the predictFcn function in each exported model structure to predict the response values of the test set data. Then, compute the RMSE for each model on the test set data, omitting any NaN values. Compare the two RMSE values. y = treeEnsemble.predictFcn(cartableTest); presetRMSE = sqrt((1/length(y))*sum((cartableTest.MPG - y).^2,'omitnan')) z = optimizableEnsemble.predictFcn(cartableTest); optRMSE = sqrt((1/length(z))*sum((cartableTest.MPG - z).^2,'omitnan')) presetRMSE = 3.4591 optRMSE = 3.1884

In this example, the trained optimizable ensemble still outperforms the trained preset model on the test set data.

See Also Related Examples

24-68



“Hyperparameter Optimization in Regression Learner App” on page 24-27



“Train Regression Models in Regression Learner App” on page 24-2



“Select Data and Validation for Regression Problem” on page 24-8



“Choose Regression Model Options” on page 24-12



“Assess Model Performance in Regression Learner” on page 24-39



“Export Regression Model to Predict New Data” on page 24-50



“Bayesian Optimization Workflow” on page 10-25

25 Support Vector Machines

25

Support Vector Machines

Understanding Support Vector Machine Regression In this section... “Mathematical Formulation of SVM Regression” on page 25-2 “Solving the SVM Regression Optimization Problem” on page 25-5

Mathematical Formulation of SVM Regression Overview Support vector machine (SVM) analysis is a popular machine learning tool for classification and regression, first identified by Vladimir Vapnik and his colleagues in 1992[5]. SVM regression is considered a nonparametric technique because it relies on kernel functions. Statistics and Machine Learning Toolbox implements linear epsilon-insensitive SVM (ε-SVM) regression, which is also known as L1 loss. In ε-SVM regression, the set of training data includes predictor variables and observed response values. The goal is to find a function f(x) that deviates from yn by a value no greater than ε for each training point x, and at the same time is as flat as possible. Linear SVM Regression: Primal Formula Suppose we have a set of training data where xn is a multivariate set of N observations with observed response values yn. To find the linear function f (x) = x′β + b, and ensure that it is as flat as possible, find f(x) with the minimal norm value (β′β). This is formulated as a convex optimization problem to minimize J β =

1 β′β 2

subject to all residuals having a value less than ε; or, in equation form: ∀n: yn − xn′β + b ≤ ε . It is possible that no such function f(x) exists to satisfy these constraints for all points. To deal with otherwise infeasible constraints, introduce slack variables ξn and ξ*n for each point. This approach is similar to the “soft margin” concept in SVM classification, because the slack variables allow regression errors to exist up to the value of ξn and ξ*n, yet still satisfy the required conditions. Including slack variables leads to the objective function, also known as the primal formula[5]: J β = subject to: 25-2

N

1 β′β + C ∑ ξn + ξn* , 2 n=1

Understanding Support Vector Machine Regression

∀n: yn − xn′β + b ≤ ε + ξn ∀n: xn′β + b − yn ≤ ε + ξn* ∀n: ξn* ≥ 0 ∀n: ξn ≥ 0 . The constant C is the box constraint, a positive numeric value that controls the penalty imposed on observations that lie outside the epsilon margin (ε) and helps to prevent overfitting (regularization). This value determines the trade-off between the flatness of f(x) and the amount up to which deviations larger than ε are tolerated. The linear ε-insensitive loss function ignores errors that are within ε distance of the observed value by treating them as equal to zero. The loss is measured based on the distance between observed value y and the ε boundary. This is formally described by Lε =

0 if  y − f x ≤ ε y − f x − ε otherwise

Linear SVM Regression: Dual Formula The optimization problem previously described is computationally simpler to solve in its Lagrange dual formulation. The solution to the dual problem provides a lower bound to the solution of the primal (minimization) problem. The optimal values of the primal and dual problems need not be equal, and the difference is called the “duality gap.” But when the problem is convex and satisfies a constraint qualification condition, the value of the optimal solution to the primal problem is given by the solution of the dual problem. To obtain the dual formula, construct a Lagrangian function from the primal function by introducing nonnegative multipliers αn and α*n for each observation xn. This leads to the dual formula, where we minimize Lα =

N

N

N

N

1 ∑ αi − αi* α j − α*j xi′x j + ε ∑ αi + αi* + ∑ yi αi* − αi 2i∑ i=1 i=1 =1 j=1

subject to the constraints N



n=1

αn − αn* = 0

∀n: 0 ≤ αn ≤ C ∀n: 0 ≤ αn* ≤ C . The β parameter can be completely described as a linear combination of the training observations using the equation N



β=

n=1

αn − αn* xn .

The function used to predict new values depends only on the support vectors: f x =

N



n=1

αn − αn* xn′x + b .

(25-1)

25-3

25

Support Vector Machines

The Karush-Kuhn-Tucker (KKT) complementarity conditions are optimization constraints required to obtain optimal solutions. For linear SVM regression, these conditions are ∀n: αn ε + ξn − yn + xn′β + b = 0 ∀n: αn* ε + ξn* + yn − xn′β − b = 0 ∀n: ξn C − αn = 0 ∀n: ξn* C − αn* = 0 . These conditions indicate that all observations strictly inside the epsilon tube have Lagrange multipliers αn = 0 and αn* = 0. If either αn or αn* is not zero, then the corresponding observation is called a support vector. The property Alpha of a trained SVM model stores the difference between two Lagrange multipliers of support vectors, αn – αn*. The properties SupportVectors and Bias store xn and b, respectively. Nonlinear SVM Regression: Primal Formula Some regression problems cannot adequately be described using a linear model. In such a case, the Lagrange dual formulation allows the previously-described technique to be extended to nonlinear functions. Obtain a nonlinear SVM regression model by replacing the dot product x1′x2 with a nonlinear kernel function G(x1,x2) = , where φ(x) is a transformation that maps x to a high-dimensional space. Statistics and Machine Learning Toolbox provides the following built-in semidefinite kernel functions. Kernel Name

Kernel Function

Linear (dot product)

G(x j, xk) = x j′xk

Gaussian

G x j, xk = exp − x j − xk

Polynomial

G(x j, xk) = (1 + x j′xk) , where q is in the set {2,3,...}.

2

q

The Gram matrix is an n-by-n matrix that contains elements gi,j = G(xi,xj). Each element gi,j is equal to the inner product of the predictors as transformed by φ. However, we do not need to know φ, because we can use the kernel function to generate Gram matrix directly. Using this method, nonlinear SVM finds the optimal function f(x) in the transformed predictor space. Nonlinear SVM Regression: Dual Formula The dual formula for nonlinear SVM regression replaces the inner product of the predictors (xi′xj) with the corresponding element of the Gram matrix (gi,j). Nonlinear SVM regression finds the coefficients that minimize Lα = subject to

25-4

N

N

N

N

1 ∑ αi − αi* α j − α*j G xi, x j + ε ∑ αi + αi* − ∑ yi αi − αi* 2i∑ i=1 i=1 =1 j=1

Understanding Support Vector Machine Regression

N



αn − αn* = 0

n=1

∀n: 0 ≤ αn ≤ C ∀n: 0 ≤ αn* ≤ C . The function used to predict new values is equal to f x =

N



n=1

αn − αn* G xn, x + b .

(25-2)

The KKT complementarity conditions are ∀n: αn ε + ξn − yn + f xn = 0 ∀n: αn* ε + ξn* + yn − f xn = 0 ∀n: ξn C − αn = 0 ∀n: ξn* C − αn* = 0 .

Solving the SVM Regression Optimization Problem Solver Algorithms The minimization problem can be expressed in standard quadratic programming form and solved using common quadratic programming techniques. However, it can be computationally expensive to use quadratic programming algorithms, especially since the Gram matrix may be too large to be stored in memory. Using a decomposition method instead can speed up the computation and avoid running out of memory. Decomposition methods (also called chunking and working set methods) separate all observations into two disjoint sets: the working set and the remaining set. A decomposition method modifies only the elements in the working set in each iteration. Therefore, only some columns of the Gram matrix are needed in each iteration, which reduces the amount of storage needed for each iteration. Sequential minimal optimization (SMO) is the most popular approach for solving SVM problems[4]. SMO performs a series of two-point optimizations. In each iteration, a working set of two points are chosen based on a selection rule that uses second-order information. Then the Lagrange multipliers for this working set are solved analytically using the approach described in [2] and [1]. In SVM regression, the gradient vector ∇L for the active set is updated after each iteration. The decomposed equation for the gradient vector is N



∇L

n

=

i=1



N



i=1

αi − αi* G xi, xn + ε − yn , n ≤ N . αi − αi* G xi, xn + ε + yn , n > N

Iterative single data algorithm (ISDA) updates one Lagrange multiplier with each iteration[3]. ISDA is often conducted without the bias term b by adding a small positive constant a to the kernel function. Dropping b drops the sum constraint 25-5

25

Support Vector Machines

N



n=1

αi − α* = 0

in the dual equation. This allows us to update one Lagrange multiplier in each iteration, which makes it easier than SMO to remove outliers. ISDA selects the worst KKT violator among all the αn and αn* values as the working set to be updated. Convergence Criteria Each of these solver algorithms iteratively computes until the specified convergence criterion is met. There are several options for convergence criteria: • Feasibility gap — The feasibility gap is expressed as Δ=

J β +L α , J β +1

where J(β) is the primal objective and L(α) is the dual objective. After each iteration, the software evaluates the feasibility gap. If the feasibility gap is less than the value specified by GapTolerance, then the algorithm met the convergence criterion and the software returns a solution. • Gradient difference — After each iteration, the software evaluates the gradient vector, ∇L. If the difference in gradient vector values for the current iteration and the previous iteration is less than the value specified by DeltaGradientTolerance, then the algorithm met the convergence criterion and the software returns a solution. • Largest KKT violation — After each iteration, the software evaluates the KKT violation for all the αn and αn* values. If the largest violation is less than the value specified by KKTTolerance, then the algorithm met the convergence criterion and the software returns a solution.

References [1] Fan, R.E. , P.H. Chen, and C.J. Lin. A Study on SMO-Type Decomposition Methods for Support Vector Machines. IEEE Transactions on Neural Networks, 17:893–908, 2006. [2] Fan, R.E. , P.H. Chen, and C.J. Lin. Working Set Selection Using Second Order Information for Training Support Vector Machines. The Journal of machine Learning Research, 6:1871–1918, 2005. [3] Huang, T.M., V. Kecman, and I. Kopriva. Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-Supervised, and Unsupervised Learning. Springer, New York, 2006. [4] Platt, J. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report MSR-TR-98–14, 1999. [5] Vapnik, V. The Nature of Statistical Learning Theory. Springer, New York, 1995.

See Also RegressionSVM | fitrsvm | predict | resubPredict

25-6

Understanding Support Vector Machine Regression

Related Examples •

“Train Linear Support Vector Machine Regression Model” on page 32-2105



“Train Support Vector Machine Regression Model” on page 32-2107



“Cross-Validate SVM Regression Model” on page 32-2108



“Optimize SVM Regression” on page 32-2110

25-7

26 Markov Models • “Markov Chains” on page 26-2 • “Hidden Markov Models (HMM)” on page 26-4

26

Markov Models

Markov Chains Markov processes are examples of stochastic processes—processes that generate random sequences of outcomes or states according to certain probabilities. Markov processes are distinguished by being memoryless—their next state depends only on their current state, not on the history that led them there. Models of Markov processes are used in a wide variety of applications, from daily stock prices to the positions of genes in a chromosome. A Markov model is given visual representation with a state diagram, such as the one below.

State Diagram for a Markov Model The rectangles in the diagram represent the possible states of the process you are trying to model, and the arrows represent transitions between states. The label on each arrow represents the probability of that transition. At each step of the process, the model may generate an output, or emission, depending on which state it is in, and then make a transition to another state. An important characteristic of Markov models is that the next state depends only on the current state, and not on the history of transitions that lead to the current state. For example, for a sequence of coin tosses the two states are heads and tails. The most recent coin toss determines the current state of the model and each subsequent toss determines the transition to the next state. If the coin is fair, the transition probabilities are all 1/2. The emission might simply be the current state. In more complicated models, random processes at each state will generate emissions. You could, for example, roll a die to determine the emission at any step. Markov chains are mathematical descriptions of Markov models with a discrete set of states. Markov chains are characterized by: • A set of states {1, 2, ..., M} • An M-by-M transition matrix T whose i,j entry is the probability of a transition from state i to state j. The sum of the entries in each row of T must be 1, because this is the sum of the probabilities of making a transition from a given state to each of the other states. • A set of possible outputs, or emissions, {s1, s2, ... , sN}. By default, the set of emissions is {1, 2, ... , N}, where N is the number of possible emissions, but you can choose a different set of numbers or symbols. 26-2

Markov Chains

• An M-by-N emission matrix E whose i,k entry gives the probability of emitting symbol sk given that the model is in state i. Markov chains begin in an initial state i0 at step 0. The chain then transitions to state i1 with probability T1i1, and emits an output sk1 with probability Ei1k1. Consequently, the probability of observing the sequence of states i1i2...ir and the sequence of emissions sk1sk2...skr in the first r steps, is T1i1Ei1k1Ti1i2Ei2k2...Tir − 1ir Eir k

See Also Related Examples •

“Hidden Markov Models (HMM)” on page 26-4

26-3

26

Markov Models

Hidden Markov Models (HMM) In this section... “Introduction to Hidden Markov Models (HMM)” on page 26-4 “Analyzing Hidden Markov Models” on page 26-5

Introduction to Hidden Markov Models (HMM) A hidden Markov model (HMM) is one in which you observe a sequence of emissions, but do not know the sequence of states the model went through to generate the emissions. Analyses of hidden Markov models seek to recover the sequence of states from the observed data. As an example, consider a Markov model with two states and six possible emissions. The model uses: • A red die, having six sides, labeled 1 through 6. • A green die, having twelve sides, five of which are labeled 2 through 6, while the remaining seven sides are labeled 1. • A weighted red coin, for which the probability of heads is .9 and the probability of tails is .1. • A weighted green coin, for which the probability of heads is .95 and the probability of tails is .05. The model creates a sequence of numbers from the set {1, 2, 3, 4, 5, 6} with the following rules: • Begin by rolling the red die and writing down the number that comes up, which is the emission. • Toss the red coin and do one of the following: • If the result is heads, roll the red die and write down the result. • If the result is tails, roll the green die and write down the result. • At each subsequent step, you flip the coin that has the same color as the die you rolled in the previous step. If the coin comes up heads, roll the same die as in the previous step. If the coin comes up tails, switch to the other die. The state diagram for this model has two states, red and green, as shown in the following figure.

26-4

Hidden Markov Models (HMM)

You determine the emission from a state by rolling the die with the same color as the state. You determine the transition to the next state by flipping the coin with the same color as the state. The transition matrix is: T=

0.9 0.1 0.05 0.95

The emissions matrix is: 1 6 E= 7 12

1 6 1 12

1 6 1 12

1 6 1 12

1 6 1 12

1 6 1 12

The model is not hidden because you know the sequence of states from the colors of the coins and dice. Suppose, however, that someone else is generating the emissions without showing you the dice or the coins. All you see is the sequence of emissions. If you start seeing more 1s than other numbers, you might suspect that the model is in the green state, but you cannot be sure because you cannot see the color of the die being rolled. Hidden Markov models raise the following questions: • Given a sequence of emissions, what is the most likely state path? • Given a sequence of emissions, how can you estimate transition and emission probabilities of the model? • What is the forward probability that the model generates a given sequence? • What is the posterior probability that the model is in a particular state at any point in the sequence?

Analyzing Hidden Markov Models • “Generating a Test Sequence” on page 26-6 • “Estimating the State Sequence” on page 26-6 • “Estimating Transition and Emission Matrices” on page 26-6 • “Estimating Posterior State Probabilities” on page 26-8 • “Changing the Initial State Distribution” on page 26-8 Statistics and Machine Learning Toolbox functions related to hidden Markov models are: • hmmgenerate — Generates a sequence of states and emissions from a Markov model • hmmestimate — Calculates maximum likelihood estimates of transition and emission probabilities from a sequence of emissions and a known sequence of states • hmmtrain — Calculates maximum likelihood estimates of transition and emission probabilities from a sequence of emissions • hmmviterbi — Calculates the most probable state path for a hidden Markov model • hmmdecode — Calculates the posterior state probabilities of a sequence of emissions This section shows how to use these functions to analyze hidden Markov models. 26-5

26

Markov Models

Generating a Test Sequence The following commands create the transition and emission matrices for the model described in the “Introduction to Hidden Markov Models (HMM)” on page 26-4: TRANS = [.9 .1; .05 .95]; EMIS = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6;... 7/12, 1/12, 1/12, 1/12, 1/12, 1/12];

To generate a random sequence of states and emissions from the model, use hmmgenerate: [seq,states] = hmmgenerate(1000,TRANS,EMIS);

The output seq is the sequence of emissions and the output states is the sequence of states. hmmgenerate begins in state 1 at step 0, makes the transition to state i1 at step 1, and returns i1 as the first entry in states. To change the initial state, see “Changing the Initial State Distribution” on page 26-8. Estimating the State Sequence Given the transition and emission matrices TRANS and EMIS, the function hmmviterbi uses the Viterbi algorithm to compute the most likely sequence of states the model would go through to generate a given sequence seq of emissions: likelystates = hmmviterbi(seq, TRANS, EMIS);

likelystates is a sequence the same length as seq. To test the accuracy of hmmviterbi, compute the percentage of the actual sequence states that agrees with the sequence likelystates. sum(states==likelystates)/1000 ans = 0.8200

In this case, the most likely sequence of states agrees with the random sequence 82% of the time. Estimating Transition and Emission Matrices • “Using hmmestimate” on page 26-6 • “Using hmmtrain” on page 26-7 The functions hmmestimate and hmmtrain estimate the transition and emission matrices TRANS and EMIS given a sequence seq of emissions. Using hmmestimate

The function hmmestimate requires that you know the sequence of states states that the model went through to generate seq. The following takes the emission and state sequences and returns estimates of the transition and emission matrices: [TRANS_EST, EMIS_EST] = hmmestimate(seq, states)

26-6

Hidden Markov Models (HMM)

TRANS_EST = 0.8989 0.1011 0.0585 0.9415 EMIS_EST = 0.1721 0.1721 0.5836 0.0741

0.1749 0.0804

0.1612 0.0789

0.1803 0.0726

0.1393 0.1104

You can compare the outputs with the original transition and emission matrices, TRANS and EMIS: TRANS TRANS = 0.9000 0.0500

0.1000 0.9500

EMIS EMIS = 0.1667 0.5833

0.1667 0.0833

0.1667 0.0833

0.1667 0.0833

0.1667 0.0833

0.1667 0.0833

Using hmmtrain

If you do not know the sequence of states states, but you have initial guesses for TRANS and EMIS, you can still estimate TRANS and EMIS using hmmtrain. Suppose you have the following initial guesses for TRANS and EMIS. TRANS_GUESS = [.85 .15; .1 .9]; EMIS_GUESS = [.17 .16 .17 .16 .17 .17;.6 .08 .08 .08 .08 08];

You estimate TRANS and EMIS as follows: [TRANS_EST2, EMIS_EST2] = hmmtrain(seq, TRANS_GUESS, EMIS_GUESS) TRANS_EST2 = 0.2286 0.7714 0.0032 0.9968 EMIS_EST2 = 0.1436 0.2348 0.4355 0.1089

0.1837 0.1144

0.1963 0.1082

0.2350 0.1109

0.0066 0.1220

hmmtrain uses an iterative algorithm that alters the matrices TRANS_GUESS and EMIS_GUESS so that at each step the adjusted matrices are more likely to generate the observed sequence, seq. The algorithm halts when the matrices in two successive iterations are within a small tolerance of each other. If the algorithm fails to reach this tolerance within a maximum number of iterations, whose default value is 100, the algorithm halts. In this case, hmmtrain returns the last values of TRANS_EST and EMIS_EST and issues a warning that the tolerance was not reached. If the algorithm fails to reach the desired tolerance, increase the default value of the maximum number of iterations with the command: hmmtrain(seq,TRANS_GUESS,EMIS_GUESS,'maxiterations',maxiter)

where maxiter is the maximum number of steps the algorithm executes. 26-7

26

Markov Models

Change the default value of the tolerance with the command: hmmtrain(seq, TRANS_GUESS, EMIS_GUESS, 'tolerance', tol)

where tol is the desired value of the tolerance. Increasing the value of tol makes the algorithm halt sooner, but the results are less accurate. Two factors reduce the reliability of the output matrices of hmmtrain: • The algorithm converges to a local maximum that does not represent the true transition and emission matrices. If you suspect this, use different initial guesses for the matrices TRANS_EST and EMIS_EST. • The sequence seq may be too short to properly train the matrices. If you suspect this, use a longer sequence for seq. Estimating Posterior State Probabilities The posterior state probabilities of an emission sequence seq are the conditional probabilities that the model is in a particular state when it generates a symbol in seq, given that seq is emitted. You compute the posterior state probabilities with hmmdecode: PSTATES = hmmdecode(seq,TRANS,EMIS)

The output PSTATES is an M-by-L matrix, where M is the number of states and L is the length of seq. PSTATES(i,j) is the conditional probability that the model is in state i when it generates the jth symbol of seq, given that seq is emitted. hmmdecode begins with the model in state 1 at step 0, prior to the first emission. PSTATES(i,1) is the probability that the model is in state i at the following step 1. To change the initial state, see “Changing the Initial State Distribution” on page 26-8. To return the logarithm of the probability of the sequence seq, use the second output argument of hmmdecode: [PSTATES,logpseq] = hmmdecode(seq,TRANS,EMIS)

The probability of a sequence tends to 0 as the length of the sequence increases, and the probability of a sufficiently long sequence becomes less than the smallest positive number your computer can represent. hmmdecode returns the logarithm of the probability to avoid this problem. Changing the Initial State Distribution By default, Statistics and Machine Learning Toolbox hidden Markov model functions begin in state 1. In other words, the distribution of initial states has all of its probability mass concentrated at state 1. To assign a different distribution of probabilities, p = [p1, p2, ..., pM], to the M initial states, do the following: 1

Create an M+1-by-M+1 augmented transition matrix, T of the following form: T =

0 p 0 T

where T is the true transition matrix. The first column of T contains M+1 zeros. p must sum to 1. 2

26-8

Create an M+1-by-N augmented emission matrix, E , that has the following form:

Hidden Markov Models (HMM)

E =

0 E

If the transition and emission matrices are TRANS and EMIS, respectively, you create the augmented matrices with the following commands: TRANS_HAT = [0 p; zeros(size(TRANS,1),1) TRANS]; EMIS_HAT = [zeros(1,size(EMIS,2)); EMIS];

See Also hmmdecode | hmmestimate | hmmgenerate | hmmtrain | hmmviterbi

More About •

“Markov Chains” on page 26-2

26-9

27 Design of Experiments • “Design of Experiments” on page 27-2 • “Full Factorial Designs” on page 27-3 • “Fractional Factorial Designs” on page 27-5 • “Response Surface Designs” on page 27-8 • “D-Optimal Designs” on page 27-12 • “Improve an Engine Cooling Fan Using Design for Six Sigma Techniques” on page 27-19

27

Design of Experiments

Design of Experiments Passive data collection leads to a number of problems in statistical modeling. Observed changes in a response variable may be correlated with, but not caused by, observed changes in individual factors (process variables). Simultaneous changes in multiple factors may produce interactions that are difficult to separate into individual effects. Observations may be dependent, while a model of the data considers them to be independent. Designed experiments address these problems. In a designed experiment, the data-producing process is actively manipulated to improve the quality of information and to eliminate redundant data. A common goal of all experimental designs is to collect data as parsimoniously as possible while providing sufficient information to accurately estimate model parameters. For example, a simple model of a response y in an experiment with two controlled factors x1 and x2 might look like this: y = β0 + β1x1 + β2x2 + β3x1x2 + ε Here ε includes both experimental error and the effects of any uncontrolled factors in the experiment. The terms β1x1 and β2x2 are main effects and the term β3x1x2 is a two-way interaction effect. A designed experiment would systematically manipulate x1 and x2 while measuring y, with the objective of accurately estimating β0, β1, β2, and β3.

27-2

Full Factorial Designs

Full Factorial Designs In this section... “Multilevel Designs” on page 27-3 “Two-Level Designs” on page 27-3

Multilevel Designs To systematically vary experimental factors, assign each factor a discrete set of levels. Full factorial designs measure response variables using every treatment (combination of the factor levels). A full factorial design for n factors with N1, ..., Nn levels requires N1 × ... × Nn experimental runs—one for each treatment. While advantageous for separating individual effects, full factorial designs can make large demands on data collection. As an example, suppose a machine shop has three machines and four operators. If the same operator always uses the same machine, it is impossible to determine if a machine or an operator is the cause of variation in production. By allowing every operator to use every machine, effects are separated. A full factorial list of treatments is generated by the Statistics and Machine Learning Toolbox function fullfact: dFF = fullfact([3,4]) dFF = 1 1 2 1 3 1 1 2 2 2 3 2 1 3 2 3 3 3 1 4 2 4 3 4

Each of the 3×4 = 12 rows of dFF represent one machine/operator combination.

Two-Level Designs Many experiments can be conducted with two-level factors, using two-level designs. For example, suppose the machine shop in the previous example always keeps the same operator on the same machine, but wants to measure production effects that depend on the composition of the day and night shifts. The Statistics and Machine Learning Toolbox function ff2n generates a full factorial list of treatments: dFF2 = ff2n(4) dFF2 = 0 0 0 0 0 0 0 0 0 1 0 1

0 0 1 1 0 0

0 1 0 1 0 1

27-3

27

Design of Experiments

0 0 1 1 1 1 1 1 1 1

1 1 0 0 0 0 1 1 1 1

1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1

Each of the 24 = 16 rows of dFF2 represent one schedule of operators for the day (0) and night (1) shifts.

27-4

Fractional Factorial Designs

Fractional Factorial Designs In this section... “Introduction to Fractional Factorial Designs” on page 27-5 “Plackett-Burman Designs” on page 27-5 “General Fractional Designs” on page 27-5

Introduction to Fractional Factorial Designs Two-level designs are sufficient for evaluating many production processes. Factor levels of ±1 can indicate categorical factors, normalized factor extremes, or simply “up” and “down” from current factor settings. Experimenters evaluating process changes are interested primarily in the factor directions that lead to process improvement. For experiments with many factors, two-level full factorial designs can lead to large amounts of data. For example, a two-level full factorial design with 10 factors requires 210 = 1024 runs. Often, however, individual factors or their interactions have no distinguishable effects on a response. This is especially true of higher order interactions. As a result, a well-designed experiment can use fewer runs for estimating model parameters. Fractional factorial designs use a fraction of the runs required by full factorial designs. A subset of experimental treatments is selected based on an evaluation (or assumption) of which factors and interactions have the most significant effects. Once this selection is made, the experimental design must separate these effects. In particular, significant effects should not be confounded, that is, the measurement of one should not depend on the measurement of another.

Plackett-Burman Designs Plackett-Burman designs are used when only main effects are considered significant. Two-level Plackett-Burman designs require a number of experimental runs that are a multiple of 4 rather than a power of 2. The MATLAB function hadamard generates these designs: dPB = hadamard(8) dPB = 1 1 1 1 -1 1 1 1 -1 1 -1 -1 1 1 1 1 -1 1 1 1 -1 1 -1 -1

1 -1 -1 1 1 -1 -1 1

1 1 1 1 -1 -1 -1 -1

1 -1 1 -1 -1 1 -1 1

1 1 -1 -1 -1 -1 1 1

1 -1 -1 1 -1 1 1 -1

Binary factor levels are indicated by ±1. The design is for eight runs (the rows of dPB) manipulating seven two-level factors (the last seven columns of dPB). The number of runs is a fraction 8/27 = 0.0625 of the runs required by a full factorial design. Economy is achieved at the expense of confounding main effects with any two-way interactions.

General Fractional Designs At the cost of a larger fractional design, you can specify which interactions you wish to consider significant. A design of resolution R is one in which no n-factor interaction is confounded with any 27-5

27

Design of Experiments

other effect containing less than R – n factors. Thus, a resolution III design does not confound main effects with one another but may confound them with two-way interactions (as in “Plackett-Burman Designs” on page 27-5), while a resolution IV design does not confound either main effects or twoway interactions but may confound two-way interactions with each other. Specify general fractional factorial designs using a full factorial design for a selected subset of basic factors and generators for the remaining factors. Generators are products of the basic factors, giving the levels for the remaining factors. Use the Statistics and Machine Learning Toolbox function fracfact to generate these designs: dfF = fracfact('a b c d bcd dfF = -1 -1 -1 -1 -1 -1 -1 1 -1 -1 1 -1 -1 -1 1 1 -1 1 -1 -1 -1 1 -1 1 -1 1 1 -1 -1 1 1 1 1 -1 -1 -1 1 -1 -1 1 1 -1 1 -1 1 -1 1 1 1 1 -1 -1 1 1 -1 1 1 1 1 -1 1 1 1 1

acd') -1 1 1 -1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1

-1 1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1

This is a six-factor design in which four two-level basic factors (a, b, c, and d in the first four columns of dfF) are measured in every combination of levels, while the two remaining factors (in the last three columns of dfF) are measured only at levels defined by the generators bcd and acd, respectively. Levels in the generated columns are products of corresponding levels in the columns that make up the generator. The challenge of creating a fractional factorial design is to choose basic factors and generators so that the design achieves a specified resolution in a specified number of runs. Use the Statistics and Machine Learning Toolbox function fracfactgen to find appropriate generators: generators = fracfactgen('a b c d e f',4,4) generators = 'a' 'b' 'c' 'd' 'bcd' 'acd'

These are generators for a six-factor design with factors a through f, using 24 = 16 runs to achieve resolution IV. The fracfactgen function uses an efficient search algorithm to find generators that meet the requirements. An optional output from fracfact displays the confounding pattern of the design: [dfF,confounding] = fracfact(generators); confounding confounding =

27-6

Fractional Factorial Designs

'Term' 'X1' 'X2' 'X3' 'X4' 'X5' 'X6' 'X1*X2' 'X1*X3' 'X1*X4' 'X1*X5' 'X1*X6' 'X2*X3' 'X2*X4' 'X2*X5' 'X2*X6' 'X3*X4' 'X3*X5' 'X3*X6' 'X4*X5' 'X4*X6' 'X5*X6'

'Generator' 'a' 'b' 'c' 'd' 'bcd' 'acd' 'ab' 'ac' 'ad' 'abcd' 'cd' 'bc' 'bd' 'cd' 'abcd' 'cd' 'bd' 'ad' 'bc' 'ac' 'ab'

'Confounding' 'X1' 'X2' 'X3' 'X4' 'X5' 'X6' 'X1*X2 + X5*X6' 'X1*X3 + X4*X6' 'X1*X4 + X3*X6' 'X1*X5 + X2*X6' 'X1*X6 + X2*X5 + X3*X4' 'X2*X3 + X4*X5' 'X2*X4 + X3*X5' 'X1*X6 + X2*X5 + X3*X4' 'X1*X5 + X2*X6' 'X1*X6 + X2*X5 + X3*X4' 'X2*X4 + X3*X5' 'X1*X4 + X3*X6' 'X2*X3 + X4*X5' 'X1*X3 + X4*X6' 'X1*X2 + X5*X6'

The confounding pattern shows that main effects are effectively separated by the design, but two-way interactions are confounded with various other two-way interactions.

27-7

27

Design of Experiments

Response Surface Designs In this section... “Introduction to Response Surface Designs” on page 27-8 “Central Composite Designs” on page 27-8 “Box-Behnken Designs” on page 27-10

Introduction to Response Surface Designs Quadratic response surfaces are simple models that provide a maximum or minimum without making additional assumptions about the form of the response. Quadratic models can be calibrated using full factorial designs with three or more levels for each factor, but these designs generally require more runs than necessary to accurately estimate model parameters. This section discusses designs for calibrating quadratic models that are much more efficient, using three or five levels for each factor, but not using all combinations of levels.

Central Composite Designs Central composite designs (CCDs), also known as Box-Wilson designs, are appropriate for calibrating full quadratic models. There are three types of CCDs—circumscribed, inscribed, and faced—pictured below:

27-8

Response Surface Designs

Each design consists of a factorial design (the corners of a cube) together with center and star points that allow for estimation of second-order effects. For a full quadratic model with n factors, CCDs have enough design points to estimate the (n+2)(n+1)/2 coefficients in a full quadratic model with n factors. The type of CCD used (the position of the factorial and star points) is determined by the number of factors and by the desired properties of the design. The following table summarizes some important properties. A design is rotatable if the prediction variance depends only on the distance of the design point from the center of the design.

27-9

27

Design of Experiments

Design

Rotatable

Factor Levels

Uses Points Outside ±1

Accuracy of Estimates

Circumscribed (CCC)

Yes

5

Yes

Good over entire design space

Inscribed (CCI)

Yes

5

No

Good over central subset of design space

Faced (CCF)

No

3

No

Fair over entire design space; poor for pure quadratic coefficients

Generate CCDs with the Statistics and Machine Learning Toolbox function ccdesign: dCC = ccdesign(3,'type','circumscribed') dCC = -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 1.0000 -1.0000 1.0000 -1.0000 -1.0000 1.0000 1.0000 1.0000 -1.0000 -1.0000 1.0000 -1.0000 1.0000 1.0000 1.0000 -1.0000 1.0000 1.0000 1.0000 -1.6818 0 0 1.6818 0 0 0 -1.6818 0 0 1.6818 0 0 0 -1.6818 0 0 1.6818 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The repeated center point runs allow for a more uniform estimate of the prediction variance over the entire design space.

Box-Behnken Designs Like the designs described in “Central Composite Designs” on page 27-8, Box-Behnken designs are used to calibrate full quadratic models. Box-Behnken designs are rotatable and, for a small number of factors (four or less), require fewer runs than CCDs. By avoiding the corners of the design space, they allow experimenters to work around extreme factor combinations. Like an inscribed CCD, however, extremes are then poorly estimated. The geometry of a Box-Behnken design is pictured in the following figure.

27-10

Response Surface Designs

Design points are at the midpoints of edges of the design space and at the center, and do not contain an embedded factorial design. Generate Box-Behnken designs with the Statistics and Machine Learning Toolbox function bbdesign: dBB = bbdesign(3) dBB = -1 -1 0 -1 1 0 1 -1 0 1 1 0 -1 0 -1 -1 0 1 1 0 -1 1 0 1 0 -1 -1 0 -1 1 0 1 -1 0 1 1 0 0 0 0 0 0 0 0 0

Again, the repeated center point runs allow for a more uniform estimate of the prediction variance over the entire design space.

27-11

27

Design of Experiments

D-Optimal Designs In this section... “Introduction to D-Optimal Designs” on page 27-12 “Generate D-Optimal Designs” on page 27-13 “Augment D-Optimal Designs” on page 27-14 “Specify Fixed Covariate Factors” on page 27-15 “Specify Categorical Factors” on page 27-16 “Specify Candidate Sets” on page 27-16

Introduction to D-Optimal Designs Traditional experimental designs (“Full Factorial Designs” on page 27-3, “Fractional Factorial Designs” on page 27-5, and “Response Surface Designs” on page 27-8) are appropriate for calibrating linear models in experimental settings where factors are relatively unconstrained in the region of interest. In some cases, however, models are necessarily nonlinear. In other cases, certain treatments (combinations of factor levels) may be expensive or infeasible to measure. D-optimal designs are model-specific designs that address these limitations of traditional designs. A D-optimal design is generated by an iterative search algorithm and seeks to minimize the covariance of the parameter estimates for a specified model. This is equivalent to maximizing the determinant D = |XTX|, where X is the design matrix of model terms (the columns) evaluated at specific treatments in the design space (the rows). Unlike traditional designs, D-optimal designs do not require orthogonal design matrices, and as a result, parameter estimates may be correlated. Parameter estimates may also be locally, but not globally, D-optimal. There are several Statistics and Machine Learning Toolbox functions for generating D-optimal designs: Function

Description

candexch

Uses a row-exchange algorithm to generate a D-optimal design with a specified number of runs for a specified model and a specified candidate set. This is the second component of the algorithm used by rowexch.

candgen

Generates a candidate set for a specified model. This is the first component of the algorithm used by rowexch.

cordexch

Uses a coordinate-exchange algorithm to generate a D-optimal design with a specified number of runs for a specified model.

daugment

Uses a coordinate-exchange algorithm to augment an existing D-optimal design with additional runs to estimate additional model terms.

dcovary

Uses a coordinate-exchange algorithm to generate a D-optimal design with fixed covariate factors.

rowexch

Uses a row-exchange algorithm to generate a D-optimal design with a specified number of runs for a specified model. The algorithm calls candgen and then candexch. (Call candexch separately to specify a candidate set.)

The following sections explain how to use these functions to generate D-optimal designs. 27-12

D-Optimal Designs

Note The Statistics and Machine Learning Toolbox function rsmdemo generates simulated data for experimental settings specified by either the user or by a D-optimal design generated by cordexch. It uses the rstool interface to visualize response surface models fit to the data, and it uses the nlintool interface to visualize a nonlinear model fit to the data.

Generate D-Optimal Designs Two Statistics and Machine Learning Toolbox algorithms generate D-optimal designs: • The cordexch function uses a coordinate-exchange algorithm • The rowexch function uses a row-exchange algorithm Both cordexch and rowexch use iterative search algorithms. They operate by incrementally changing an initial design matrix X to increase D = |XTX| at each step. In both algorithms, there is randomness built into the selection of the initial design and into the choice of the incremental changes. As a result, both algorithms may return locally, but not globally, D-optimal designs. Run each algorithm multiple times and select the best result for your final design. Both functions have a 'tries' parameter that automates this repetition and comparison. At each step, the row-exchange algorithm exchanges an entire row of X with a row from a design matrix C evaluated at a candidate set of feasible treatments. The rowexch function automatically generates a C appropriate for a specified model, operating in two steps by calling the candgen and candexch functions in sequence. Provide your own C by calling candexch directly. In either case, if C is large, its static presence in memory can affect computation. The coordinate-exchange algorithm, by contrast, does not use a candidate set. (Or rather, the candidate set is the entire design space.) At each step, the coordinate-exchange algorithm exchanges a single element of X with a new element evaluated at a neighboring point in design space. The absence of a candidate set reduces demands on memory, but the smaller scale of the search means that the coordinate-exchange algorithm is more likely to become trapped in a local minimum than the row-exchange algorithm. For example, suppose you want a design to estimate the parameters in the following three-factor, seven-term interaction model: y = β0 + β1x 1+ β2x 2+ β3x 3+ β12x1x 2+ β13x1x 3+ β23x2x 3+ ε Use cordexch to generate a D-optimal design with seven runs: nfactors = 3; nruns = 7; [dCE,X] = cordexch(nfactors,nruns,'interaction','tries',10) dCE = -1 1 1 -1 -1 -1 1 1 1 -1 1 -1 1 -1 1 1 -1 -1 -1 -1 1 X = 1 -1 1 1 -1 -1 1 1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1

27-13

27

Design of Experiments

1 1 1 1

-1 1 1 -1

1 -1 -1 -1

-1 1 -1 1

-1 -1 -1 1

1 1 -1 -1

-1 -1 1 -1

Columns of the design matrix X are the model terms evaluated at each row of the design dCE. The terms appear in order from left to right: 1

Constant term

2

Linear terms (1, 2, 3)

3

Interaction terms (12, 13, 23)

Use X in a linear regression model fit to response data measured at the design points in dCE. Use rowexch in a similar fashion to generate an equivalent design: [dRE,X] = rowexch(nfactors,nruns,'interaction','tries',10) dRE = -1 -1 1 1 -1 1 1 -1 -1 1 1 1 -1 -1 -1 -1 1 -1 -1 1 1 X = 1 -1 -1 1 1 -1 -1 1 1 -1 1 -1 1 -1 1 1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 -1 -1 -1 1 1 1 1 -1 1 -1 -1 1 -1 1 -1 1 1 -1 -1 1

Augment D-Optimal Designs In practice, you may want to add runs to a completed experiment to learn more about a process and estimate additional model coefficients. The daugment function uses a coordinate-exchange algorithm to augment an existing D-optimal design. For example, the following eight-run design is adequate for estimating main effects in a four-factor model: dCEmain = cordexch(4,8) dCEmain = 1 -1 -1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 -1 1 1 1 1 -1 1 -1 -1 1 -1 -1 -1 -1 -1 1 -1

To estimate the six interaction terms in the model, augment the design with eight additional runs: 27-14

D-Optimal Designs

dCEinteraction = daugment(dCEmain,8,'interaction') dCEinteraction = 1 -1 -1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 -1 1 1 1 1 -1 1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 1 1 1 -1 -1 -1 -1 1 -1 1 -1 1 1 -1 1 -1 1 1 -1 1 1 -1 -1 1 -1 1 1 1 1 1 -1

The augmented design is full factorial, with the original eight runs in the first eight rows. The 'start' parameter of the candexch function provides the same functionality as daugment, but uses a row exchange algorithm rather than a coordinate-exchange algorithm.

Specify Fixed Covariate Factors In many experimental settings, certain factors and their covariates are constrained to a fixed set of levels or combinations of levels. These cannot be varied when searching for an optimal design. The dcovary function allows you to specify fixed covariate factors in the coordinate exchange algorithm. For example, suppose you want a design to estimate the parameters in a three-factor linear additive model, with eight runs that necessarily occur at different times. If the process experiences temporal linear drift, you may want to include the run time as a variable in the model. Produce the design as follows: time = linspace(-1,1,8)'; [dCV,X] = dcovary(3,time,'linear') dCV = -1.0000 1.0000 1.0000 -1.0000 1.0000 -1.0000 -1.0000 -0.7143 -1.0000 -1.0000 -1.0000 -0.4286 1.0000 -1.0000 1.0000 -0.1429 1.0000 1.0000 -1.0000 0.1429 -1.0000 1.0000 -1.0000 0.4286 1.0000 1.0000 1.0000 0.7143 -1.0000 -1.0000 1.0000 1.0000 X = 1.0000 -1.0000 1.0000 1.0000 1.0000 1.0000 -1.0000 -1.0000 1.0000 -1.0000 -1.0000 -1.0000 1.0000 1.0000 -1.0000 1.0000 1.0000 1.0000 1.0000 -1.0000 1.0000 -1.0000 1.0000 -1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 -1.0000 -1.0000 1.0000

-1.0000 -0.7143 -0.4286 -0.1429 0.1429 0.4286 0.7143 1.0000

27-15

27

Design of Experiments

The column vector time is a fixed factor, normalized to values between ±1. The number of rows in the fixed factor specifies the number of runs in the design. The resulting design dCV gives factor settings for the three controlled model factors at each time.

Specify Categorical Factors Categorical factors take values in a discrete set of levels. Both cordexch and rowexch have a 'categorical' parameter that allows you to specify the indices of categorical factors and a 'levels' parameter that allows you to specify a number of levels for each factor. For example, the following eight-run design is for a linear additive model with five factors in which the final factor is categorical with three levels: dCEcat = cordexch(5,8,'linear','categorical',5,'levels',3) dCEcat = -1 -1 1 1 2 -1 -1 -1 -1 3 1 1 1 1 3 1 1 -1 -1 2 1 -1 -1 1 3 -1 1 -1 1 1 -1 1 1 -1 3 1 -1 1 -1 1

Specify Candidate Sets The row-exchange algorithm exchanges rows of an initial design matrix X with rows from a design matrix C evaluated at a candidate set of feasible treatments. The rowexch function automatically generates a C appropriate for a specified model, operating in two steps by calling the candgen and candexch functions in sequence. Provide your own C by calling candexch directly. For example, the following uses rowexch to generate a five-run design for a two-factor pure quadratic model using a candidate set that is produced internally: dRE1 = rowexch(2,5,'purequadratic','tries',10) dRE1 = -1 1 0 0 1 -1 1 0 1 1

The same thing can be done using candgen and candexch in sequence: [dC,C] = candgen(2,'purequadratic') % Candidate set, C dC = -1 -1 0 -1 1 -1 -1 0 0 0 1 0 -1 1 0 1 1 1

27-16

D-Optimal Designs

C = 1 -1 -1 1 1 1 0 -1 0 1 1 1 -1 1 1 1 -1 0 1 0 1 0 0 0 0 1 1 0 1 0 1 -1 1 1 1 1 0 1 0 1 1 1 1 1 1 treatments = candexch(C,5,'tries',10) % D-opt subset treatments = 2 1 7 3 4 dRE2 = dC(treatments,:) % Display design dRE2 = 0 -1 -1 -1 -1 1 1 -1 -1 0

You can replace C in this example with a design matrix evaluated at your own candidate set. For example, suppose your experiment is constrained so that the two factors cannot have extreme settings simultaneously. The following produces a restricted candidate set: constraint = sum(abs(dC),2) < 2; % Feasible treatments my_dC = dC(constraint,:) my_dC = 0 -1 -1 0 0 0 1 0 0 1

Use the x2fx function to convert the candidate set to a design matrix: my_C = x2fx(my_dC,'purequadratic') my_C = 1 0 -1 0 1 1 -1 0 1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 1

Find the required design in the same manner: my_treatments = candexch(my_C,5,'tries',10) % D-opt subset my_treatments = 2 4 5 1 3 my_dRE = my_dC(my_treatments,:) % Display design

27-17

27

Design of Experiments

my_dRE = -1 1 0 0 0

27-18

0 0 1 -1 0

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques This example shows how to improve the performance of an engine cooling fan through a Design for Six Sigma approach using Define, Measure, Analyze, Improve, and Control (DMAIC). The initial fan does not circulate enough air through the radiator to keep the engine cool during difficult conditions. First the example shows how to design an experiment to investigate the effect of three performance factors: fan distance from the radiator, blade-tip clearance, and blade pitch angle. It then shows how to estimate optimum values for each factor, resulting in a design that produces airflows beyond the goal of 875 ft3 per minute using test data. Finally it shows how to use simulations to verify that the new design produces airflow according to the specifications in more than 99.999% of the fans manufactured. This example uses MATLAB, Statistics and Machine Learning Toolbox, and Optimization Toolbox. Define the Problem This example addresses an engine cooling fan design that is unable to pull enough air through the radiator to keep the engine cool during difficult conditions, such as stop-and-go traffic or hot weather). Suppose you estimate that you need airflow of at least 875 ft3/min to keep the engine cool during difficult conditions. You need to evaluate the current design and develop an alternative design that can achieve the target airflow. Assess Cooling Fan Performance Load the sample data. load(fullfile(matlabroot,'help/toolbox/stats/examples','OriginalFan.mat'))

The data consists of 10,000 measurements (historical production data) of the existing cooling fan performance. Plot the data to analyze the current fan's performance. plot(originalfan) xlabel('Observation') ylabel('Max Airflow (ft^3/min)') title('Historical Production Data')

27-19

27

Design of Experiments

The data is centered around 842 ft3/min and most values fall within the range of about 8 ft3/min. The plot does not tell much about the underlying distribution of data, however. Plot the histogram and fit a normal distribution to the data. figure() histfit(originalfan) % Plot histogram with normal distribution fit format shortg xlabel('Airflow (ft^3/min)') ylabel('Frequency (counts)') title('Airflow Histogram')

27-20

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

pd = fitdist(originalfan,'normal') % Fit normal distribution to data pd = NormalDistribution Normal distribution mu = 841.652 [841.616, 841.689] sigma = 1.8768 [1.85114, 1.90318]

fitdist fits a normal distribution to data and estimates the parameters from data. The estimate for the mean airflow speed is 841.652 ft3/min, and the 95% confidence interval for the mean airflow speed is (841.616, 841.689). This estimate makes it clear that the current fan is not close to the required 875 ft3/min. There is need to improve the fan design to achieve the target airflow. Determine Factors That Affect Fan Performance Evaluate the factors that affect cooling fan performance using design of experiments (DOE). The response is the cooling fan airflow rate (ft3/min). Suppose that the factors that you can modify and control are: • Distance from radiator • Pitch angle • Blade tip clearance

27-21

27

Design of Experiments

In general, fluid systems have nonlinear behavior. Therefore, use a response surface design to estimate any nonlinear interactions among the factors. Generate the experimental runs for a BoxBehnken design on page 27-10 in coded (normalized) variables [-1, 0, +1]. CodedValue = bbdesign(3) CodedValue = -1 -1 1 1 -1 -1 1 1 0 0 0 0 0 0 0

-1 1 -1 1 0 0 0 0 -1 -1 1 1 0 0 0

0 0 0 0 -1 1 -1 1 -1 1 -1 1 0 0 0

The first column is for the distance from radiator, the second column is for the pitch angle, and the third column is for the blade tip clearance. Suppose you want to test the effects of the variables at the following minimum and maximum values. Distance from radiator: 1 to 1.5 inches Pitch angle: 15 to 35 degrees Blade tip clearance: 1 to 2 inches Randomize the order of the runs, convert the coded design values to real-world units, and perform the experiment in the order specified. runorder = randperm(15); bounds = [1 1.5;15 35;1 2];

% Random permutation of the runs % Min and max values for each factor

RealValue = zeros(size(CodedValue)); for i = 1:size(CodedValue,2) % Convert coded values to real-world units zmax = max(CodedValue(:,i)); zmin = min(CodedValue(:,i)); RealValue(:,i) = interp1([zmin zmax],bounds(i,:),CodedValue(:,i)); end

Suppose that at the end of the experiments, you collect the following response values in the variable TestResult. TestResult = [837 864 829 856 880 879 872 874 834 833 860 859 874 876 875]';

Display the design values and the response. disp({'Run Number','Distance','Pitch','Clearance','Airflow'}) disp(sortrows([runorder' RealValue TestResult])) 'Run Number'

'Distance' 1

27-22

1.5

'Pitch' 35

'Clearance'

'Airflow'

1.5

856

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

2 3 4 5 6 7 8 9 10 11 12 13 14 15

1.25 1.5 1.25 1 1.25 1.25 1.5 1.25 1 1.5 1 1.25 1 1.25

25 25 25 35 25 15 15 15 15 25 25 35 25 35

1.5 1 1.5 1.5 1.5 2 1.5 1 1.5 2 1 1 2 2

876 872 875 864 874 833 829 834 837 874 880 860 879 859

Save the design values and the response in a table. Expmt = table(runorder', CodedValue(:,1), CodedValue(:,2), CodedValue(:,3), ... TestResult,'VariableNames',{'RunNumber','D','P','C','Airflow'});

D stands for Distance, P stands for Pitch, and C stands for Clearance. Based on the experimental test results, the airflow rate is sensitive to the changing factors values. Also, four experimental runs meet or exceed the target airflow rate of 875 ft3/min (runs 2, 4,12, and 14). However, it is not clear which, if any, of these runs is the optimal one. In addition, it is not obvious how robust the design is to variation in the factors. Create a model based on the current experimental data and use the model to estimate the optimal factor settings. Improve the Cooling Fan Performance The Box-Behnken design enables you to test for nonlinear (quadratic) effects. The form of the quadratic model is: AF = β0+β1*Distance+β2*Pitch+β3*Clearance+β4*Distance*Pitch +β5 * Distance * Clearance + β6 * Pitch * Clearance + β7 * Distance2 2

+β8 * Pitch + β9 * Clearance2, where AF is the airflow rate and Bi is the coefficient for the term i. Estimate the coefficients of this model using the fitlm function from Statistics and Machine Learning Toolbox. mdl = fitlm(Expmt,'Airflow~D*P*C-D:P:C+D^2+P^2+C^2');

Display the magnitudes of the coefficients (for normalized values) in a bar chart. figure() h = bar(mdl.Coefficients.Estimate(2:10)); set(h,'facecolor',[0.8 0.8 0.9]) legend('Coefficient') set(gcf,'units','normalized','position',[0.05 0.4 0.35 0.4]) set(gca,'xticklabel',mdl.CoefficientNames(2:10)) ylabel('Airflow (ft^3/min)') xlabel('Normalized Coefficient') title('Quadratic Model Coefficients')

27-23

27

Design of Experiments

The bar chart shows that Pitch and Pitch2 are dominant factors. You can look at the relationship between multiple input variables and one output variable by generating a response surface plot. Use plotSlice to generate response surface plots for the model mdl interactively. plotSlice(mdl)

27-24

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

The plot shows the nonlinear relationship of airflow with pitch. Move the blue dashed lines around and see the effect the different factors have on airflow. Although you can use plotSlice to determine the optimum factor settings, you can also use Optimization Toolbox to automate the task. Find the optimal factor settings using the constrained optimization function fmincon. Write the objective function. f = @(x) -x2fx(x,'quadratic')*mdl.Coefficients.Estimate;

The objective function is a quadratic response surface fit to the data. Minimizing the negative airflow using fmincon is the same as maximizing the original objective function. The constraints are the upper and lower limits tested (in coded values). Set the initial starting point to be the center of the design of the experimental test matrix. lb = [-1 -1 -1]; % Lower bound ub = [1 1 1]; % Upper bound x0 = [0 0 0]; % Starting point [optfactors,fval] = fmincon(f,x0,[],[],[],[],lb,ub,[]); % Invoke the solver Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the default value of the function tolerance, and constraints are satisfied to within the default value of the constraint tolerance.

Convert the results to a maximization problem and real-world units. maxval = -fval; maxloc = (optfactors + 1)'; bounds = [1 1.5;15 35;1 2]; maxloc=bounds(:,1)+maxloc .* ((bounds(:,2) - bounds(:,1))/2); disp('Optimal Values:') disp({'Distance','Pitch','Clearance','Airflow'}) disp([maxloc' maxval]) Optimal Values: 'Distance'

'Pitch'

'Clearance'

'Airflow'

27-25

27

Design of Experiments

1

27.275

1

882.26

The optimization result suggests placing the new fan one inch from the radiator, with a one-inch clearance between the tips of the fan blades and the shroud. Because pitch angle has such a significant effect on airflow, perform additional analysis to verify that a 27.3 degree pitch angle is optimal. load(fullfile(matlabroot,'help/toolbox/stats/examples','AirflowData.mat')) tbl = table(pitch,airflow); mdl2 = fitlm(tbl,'airflow~pitch^2'); mdl2.Rsquared.Ordinary ans = 0.99632

The results show that a quadratic model explains the effect of pitch on the airflow well. Plot the pitch angle against airflow and impose the fitted model. figure() plot(pitch,airflow,'.r') hold on ylim([840 885]) line(pitch,mdl2.Fitted,'color','b') title('Fitted Model and Data') xlabel('Pitch angle (degrees)') ylabel('Airflow (ft^3/min)') legend('Test data','Quadratic model','Location','se') hold off

27-26

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

Find the pitch value that corresponds to the maximum airflow. pitch(find(airflow==max(airflow))) ans = 27

The additional analysis confirms that a 27.3 degree pitch angle is optimal. The improved cooling fan design meets the airflow requirements. You also have a model that approximates the fan performance well based on the factors you can modify in the design. Ensure that the fan performance is robust to variability in manufacturing and installation by performing a sensitivity analysis. Sensitivity Analysis Suppose that, based on historical experience, the manufacturing uncertainty is as follows. Factor

Real Values

Coded Values

Distance from radiator

1.00 +/- 0.05 inch

1.00 +/- 0.20 inch

Blade pitch angle

27.3 +/- 0.25 degrees

0.227 +/- 0.028 degrees

Blade tip clearance

1.00 +/- 0.125 inch

-1.00 +/- 0.25 inch

27-27

27

Design of Experiments

Verify that these variations in factors will enable to maintain a robust design around the target airflow. The philosophy of Six Sigma targets a defect rate of no more than 3.4 per 1,000,000 fans. That is, the fans must hit the 875 ft3/min target 99.999% of the time. You can verify the design using Monte Carlo simulation. Generate 10,000 random numbers for three factors with the specified tolerance. First, set the state of the random number generators so results are consistent across different runs. rng('default')

Perform the Monte Carlo simulation. Include a noise variable that is proportional to the noise in the fitted model, mdl (that is, the RMS error of the model). Because the model coefficients are in coded variables, you must generate dist, pitch, and clearance using the coded definition. dist = random('normal',optfactors(1),0.20,[10000 1]); pitch = random('normal',optfactors(2),0.028,[10000 1]); clearance = random('normal',optfactors(3),0.25,[10000 1]); noise = random('normal',0,mdl2.RMSE,[10000 1]);

Calculate airflow for 10,000 random factor combinations using the model. simfactor = [dist pitch clearance]; X = x2fx(simfactor,'quadratic');

Add noise to the model (the variation in the data that the model did not account for). simflow = X*mdl.Coefficients.Estimate+noise;

Evaluate the variation in the model's predicted airflow using a histogram. To estimate the mean and standard deviation, fit a normal distribution to data. pd = fitdist(simflow,'normal'); histfit(simflow) hold on text(pd.mu+2,300,['Mean: ' num2str(round(pd.mu))]) text(pd.mu+2,280,['Standard deviation: ' num2str(round(pd.sigma))]) hold off xlabel('Airflow (ft^3/min)') ylabel('Frequency') title('Monte Carlo Simulation Results')

27-28

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

The results look promising. The average airflow is 882 ft3/min and appears to be better than 875 ft3/min for most of the data. Determine the probability that the airflow is at 875 ft3/min or below. format long pfail = cdf(pd,875) pass = (1-pfail)*100 pfail = 1.509289008603141e-07 pass = 99.999984907109919

The design appears to achieve at least 875 ft3/min of airflow 99.999% of the time. Use the simulation results to estimate the process capability. S = capability(simflow,[875.0 890]) pass = (1-S.Pl)*100 S = mu: 8.822982645666709e+02

27-29

27

Design of Experiments

sigma: P: Pl: Pu: Cp: Cpl: Cpu: Cpk:

1.424806876923940 0.999999816749816 1.509289008603141e-07 3.232128339675335e-08 1.754623760237126 1.707427788957002 1.801819731517250 1.707427788957002

pass = 99.9999849071099

The Cp value is 1.75. A process is considered high quality when Cp is greater than or equal to 1.6. The Cpk is similar to the Cp value, which indicates that the process is centered. Now implement this design. Monitor it to verify the design process and to ensure that the cooling fan delivers high-quality performance. Control Manufacturing of the Improved Cooling Fan You can monitor and evaluate the manufacturing and installation process of the new fan using control charts. Evaluate the first 30 days of production of the new cooling fan. Initially, five cooling fans per day were produced. First, load the sample data from the new process. load(fullfile(matlabroot,'help/toolbox/stats/examples','spcdata.mat'))

Plot the X-bar and S charts. figure() controlchart(spcflow,'chart',{'xbar','s'}) % Reshape the data into daily sets xlabel('Day')

27-30

Improve an Engine Cooling Fan Using Design for Six Sigma Techniques

According to the results, the manufacturing process is in statistical control, as indicated by the absence of violations of control limits or nonrandom patterns in the data over time. You can also run a capability analysis on the data to evaluate the process. [row,col] = size(spcflow); S2 = capability(reshape(spcflow,row*col,1),[875.0 890]) pass = (1-S.Pl)*100 S2 = mu: sigma: P: Pl: Pu: Cp: Cpl: Cpu: Cpk:

8.821061141685465e+02 1.423887508874697 0.999999684316149 3.008932155898586e-07 1.479063578225176e-08 1.755756676295137 1.663547652525458 1.847965700064817 1.663547652525458

pass = 99.9999699106784

The Cp value of 1.755 is very similar to the estimated value of 1.73. The Cpk value of 1.66 is smaller than the Cp value. However, only a Cpk value less than 1.33, which indicates that the process shifted 27-31

27

Design of Experiments

significantly toward one of the process limits, is a concern. The process is well within the limits and it achieves the target airflow (875 ft3/min) more than 99.999% of the time.

27-32

28 Statistical Process Control • “Control Charts” on page 28-2 • “Capability Studies” on page 28-5

28

Statistical Process Control

Control Charts A control chart displays measurements of process samples over time. The measurements are plotted together with user-defined specification limits and process-defined control limits. The process can then be compared with its specifications—to see if it is in control or out of control. The chart is just a monitoring tool. Control activity might occur if the chart indicates an undesirable, systematic change in the process. The control chart is used to discover the variation, so that the process can be adjusted to reduce it. Control charts are created with the controlchart function. Any of the following chart types may be specified: • Xbar or mean • Standard deviation • Range • Exponentially weighted moving average • Individual observation • Moving range of individual observations • Moving average of individual observations • Proportion defective • Number of defectives • Defects per unit • Count of defects Control rules are specified with the controlrules function. The following example illustrates how to use Western Electric rules to mark out of control measurements on an Xbar chart. First load the sample data. load parts;

Construct the Xbar control chart using the Western Electric 2 rule (2 of 3 points at least 2 standard errors above the center line) to mark the out of control measurements. st = controlchart(runout,'rules','we2');

28-2

Control Charts

For a better understanding of the Western Electric 2 rule, calculate and plot the 2 standard errors line on the chart. x = st.mean; cl = st.mu; se = st.sigma./sqrt(st.n); hold on plot(cl+2*se,'m')

28-3

28

Statistical Process Control

Identify the measurements that violate the control rule. R = controlrules('we2',x,cl,se); I = find(R) I = 6×1 21 23 24 25 26 27

28-4

Capability Studies

Capability Studies Before going into production, many manufacturers run a capability study to determine if their process will run within specifications enough of the time. Capability indices produced by such a study are used to estimate expected percentages of defective parts. Capability studies are conducted with the capability function. The following capability indices are produced: • mu — Sample mean • sigma — Sample standard deviation • P — Estimated probability of being within the lower (L) and upper (U) specification limits • Pl — Estimated probability of being below L • Pu — Estimated probability of being above U • Cp — (U-L)/(6*sigma) • Cpl — (mu-L)./(3.*sigma) • Cpu — (U-mu)./(3.*sigma) • Cpk — min(Cpl,Cpu) As an example, simulate a sample from a process with a mean of 3 and a standard deviation of 0.005: rng default; % For reproducibility data = normrnd(3,0.005,100,1);

Compute capability indices if the process has an upper specification limit of 3.01 and a lower specification limit of 2.99: S = capability(data,[2.99 3.01]) S = struct mu: sigma: P: Pl: Pu: Cp: Cpl: Cpu: Cpk:

with fields: 3.0006 0.0058 0.9129 0.0339 0.0532 0.5735 0.6088 0.5382 0.5382

Visualize the specification and process widths: capaplot(data,[2.99 3.01]); grid on

28-5

28

28-6

Statistical Process Control

29 Tall Arrays • “Logistic Regression with Tall Arrays” on page 29-2 • “Bayesian Optimization with Tall Arrays” on page 29-9

29

Tall Arrays

Logistic Regression with Tall Arrays This example shows how to use logistic regression and other techniques to perform data analysis on tall arrays. Tall arrays represent data that is too large to fit into computer memory. Define Execution Environment When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function. mapreducer(0)

Get Data into MATLAB Create a datastore that references the folder location with the data. The data can be contained in a single file, a collection of files, or an entire folder. Treat 'NA' values as missing data so that datastore replaces them with NaN values. Select a subset of the variables to work with, and include the name of the airline (UniqueCarrier) as a categorical variable. Create a tall table on top of the datastore. ds = datastore('airlinesmall.csv'); ds.TreatAsMissing = 'NA'; ds.SelectedVariableNames = {'DayOfWeek','UniqueCarrier',... 'ArrDelay','DepDelay','Distance'}; ds.SelectedFormats{2} = '%C'; tt = tall(ds); tt.DayOfWeek = categorical(tt.DayOfWeek,1:7,... {'Sun','Mon','Tues','Wed','Thu','Fri','Sat'},'Ordinal',true) tt = Mx5 tall table DayOfWeek _________

UniqueCarrier _____________

? ? ? : :

? ? ? : :

ArrDelay ________ ? ? ? : :

DepDelay ________ ? ? ? : :

Distance ________ ? ? ? : :

Late Flights Determine the flights that are late by 20 minutes or more by defining a logical variable that is true for a late flight. Add this variable to the tall table of data, noting that it is not yet evaluated. A preview of this variable includes the first few rows. tt.LateFlight = tt.ArrDelay>=20 tt = Mx6 tall table DayOfWeek

29-2

UniqueCarrier

ArrDelay

DepDelay

Distance

LateFlight

Logistic Regression with Tall Arrays

_________

_____________

? ? ? : :

? ? ? : :

________ ? ? ? : :

________

________

? ? ? : :

? ? ? : :

__________ ? ? ? : :

Calculate the mean of LateFlight to determine the overall proportion of late flights. Use gather to trigger evaluation of the tall array and bring the result into memory. m = mean(tt.LateFlight) m = tall double ? m = gather(m) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 1.9 sec - Pass 2 of 2: Completed in 2.1 sec Evaluation completed in 5.4 sec m = 0.1580

Late Flights by Carrier Examine whether certain types of flights tend to be late. First, check to see if certain carriers are more likely to have late flights. tt.LateFlight = double(tt.LateFlight); late_by_carrier = gather(grpstats(tt,'UniqueCarrier','mean','DataVar','LateFlight')) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 4.5 sec Evaluation completed in 5.8 sec late_by_carrier=29×4 table GroupLabel UniqueCarrier __________ _____________ {'9E' } {'AA' } {'AQ' } {'AS' } {'B6' } {'CO' } {'DH' } {'DL' } {'EA' } {'EV' } {'F9' } {'FL' } {'HA' } {'HP' } {'ML (1)'}

9E AA AQ AS B6 CO DH DL EA EV F9 FL HA HP ML (1)

GroupCount __________ 521 14930 154 2910 806 8138 696 16578 920 1699 335 1263 273 3660 69

mean_LateFlight _______________ 0.13436 0.16236 0.051948 0.16014 0.23821 0.16319 0.17672 0.15261 0.15217 0.21248 0.18209 0.19952 0.047619 0.13907 0.043478

29-3

29

Tall Arrays

{'MQ' ⋮

}

MQ

3962

0.18778

Carriers B6 and EV have higher proportions of late flights. Carriers AQ, ML(1), and HA have relatively few flights, but lower proportions of them are late. Late Flights by Day of Week Next, check to see if different days of the week tend to have later flights. late_by_day = gather(grpstats(tt,'DayOfWeek','mean','DataVar','LateFlight')) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 1.8 sec Evaluation completed in 2.2 sec late_by_day=7×4 table GroupLabel DayOfWeek __________ _________ {'Fri' } {'Mon' } {'Sat' } {'Sun' } {'Thu' } {'Tues'} {'Wed' }

Fri Mon Sat Sun Thu Tues Wed

GroupCount __________ 15839 18077 16958 18019 18227 18163 18240

mean_LateFlight _______________ 0.12899 0.14234 0.15603 0.15117 0.18418 0.15526 0.18399

Wednesdays and Thursdays have the highest proportion of late flights, while Fridays have the lowest proportion. Late Flights by Distance Check to see if longer or shorter flights tend to be late. First, look at the density of the flight distance for flights that are late, and compare that with flights that are on time. ksdensity(tt.Distance(tt.LateFlight==1)) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 2.2 sec - Pass 2 of 2: Completed in 1.8 sec Evaluation completed in 4.8 sec hold on ksdensity(tt.Distance(tt.LateFlight==0)) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 2: Completed in 1.4 sec - Pass 2 of 2: Completed in 3.1 sec Evaluation completed in 5 sec hold off legend('Late','On time')

29-4

Logistic Regression with Tall Arrays

Flight distance does not make a dramatic difference in whether a flight is early or late. However, the density appears to be slightly higher for on-time flights at distances of about 400 miles. The density is also higher for late flights at distances of about 2000 miles. Calculate some simple descriptive statistics for the late and on-time flights. late_by_distance = gather(grpstats(tt,'LateFlight',{'mean' 'std'},'DataVar','Distance')) Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 2.9 sec Evaluation completed in 3.7 sec late_by_distance=2×5 table GroupLabel LateFlight __________ __________ {'0'} {'1'}

0 1

GroupCount __________ 1.04e+05 19519

mean_Distance _____________ 693.14 750.24

std_Distance ____________ 544.75 574.12

Late flights are about 60 miles longer on average, although this value makes up only a small portion of the standard deviation of the distance values. Logistic Regression Model Build a model for the probability of a late flight, using both continuous variables (such as Distance) and categorical variables (such as DayOfWeek) to predict the probabilities. This model can help to determine if the previous results observed for each predictor individually also hold true when you consider them together. 29-5

29

Tall Arrays

glm = fitglm(tt,'LateFlight~Distance+DayOfWeek','Distribution','binomial') Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration

[1]: [1]: [2]: [2]: [3]: [3]: [4]: [4]: [5]: [5]:

0% completed 100% completed 0% completed 100% completed 0% completed 100% completed 0% completed 100% completed 0% completed 100% completed

glm = Compact generalized linear regression model: logit(LateFlight) ~ 1 + DayOfWeek + Distance Distribution = Binomial Estimated Coefficients:

(Intercept) DayOfWeek_Mon DayOfWeek_Tues DayOfWeek_Wed DayOfWeek_Thu DayOfWeek_Fri DayOfWeek_Sat Distance

Estimate __________

SE __________

tStat _______

pValue __________

-1.855 -0.072603 0.026909 0.2359 0.23569 -0.19285 0.033542 0.00018373

0.023052 0.029798 0.029239 0.028276 0.028282 0.031583 0.029702 1.3507e-05

-80.469 -2.4365 0.92029 8.343 8.3338 -6.106 1.1293 13.602

0 0.01483 0.35742 7.2452e-17 7.8286e-17 1.0213e-09 0.25879 3.8741e-42

123319 observations, 123311 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 504, p-value = 8.74e-105

The model confirms that the previously observed conclusions hold true here as well: • The Wednesday and Thursday coefficients are positive, indicating a higher probability of a late flight on those days. The Friday coefficient is negative, indicating a lower probability. • The Distance coefficient is positive, indicating that longer flights have a higher probability of being late. All of these coefficients have very small p-values. This is common with data sets that have many observations, since one can reliably estimate small effects with large amounts of data. In fact, the uncertainty in the model is larger than the uncertainty in the estimates for the parameters in the model. Prediction with Model Predict the probability of a late flight for each day of the week, and for distances ranging from 0 to 3000 miles. Create a table to hold the predictor values by indexing the first 100 rows in the original table tt. x = gather(tt(1:100,{'Distance' 'DayOfWeek'}));

29-6

Logistic Regression with Tall Arrays

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 0.54 sec Evaluation completed in 0.96 sec x.Distance = linspace(0,3000)'; x.DayOfWeek(:) = 'Sun'; plot(x.Distance,predict(glm,x)); days = {'Sun' 'Mon' 'Tues' 'Wed' 'Thu' 'Fri' 'Sat'}; hold on for j=2:length(days) x.DayOfWeek(:) = days{j}; plot(x.Distance,predict(glm,x)); end legend(days)

According to this model, a Wednesday or Thursday flight of 500 miles has the same probability of being late, about 18%, as a Friday flight of about 3000 miles. Since these probabilities are all much less than 50%, the model is unlikely to predict that any given flight will be late using this information. Investigate the model more by focusing on the flights for which the model predicts a probability of 20% or more of being late, and compare that to the actual results. C = gather(crosstab(tt.LateFlight,predict(glm,tt)>.20))

29-7

29

Tall Arrays

Evaluating tall expression using the Local MATLAB Session: - Pass 1 of 1: Completed in 2.3 sec Evaluation completed in 2.6 sec C = 2×2 99613 18394

4391 1125

Among the flights predicted to have a 20% or higher probability of being late, about 20% were late 1125/(1125 + 4391). Among the remainder, less than 16% were late 18394/(18394 + 99613).

29-8

Bayesian Optimization with Tall Arrays

Bayesian Optimization with Tall Arrays This example shows how to use Bayesian optimization to select optimal parameters for training a kernel classifier by using the 'OptimizeHyperparameters' name-value pair argument. The sample data set airlinesmall.csv is a large data set that contains a tabular file of airline flight data. This example creates a tall table containing the data and uses the tall table to run the optimization procedure. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the mapreducer function. Get Data into MATLAB® Create a datastore that references the folder location with the data. The data can be contained in a single file, a collection of files, or an entire folder. For folders that contain a collection of files, you can specify the entire folder location, or use the wildcard character, '*.csv', to include multiple files with the same file extension in the datastore. Select a subset of the variables to work with, and treat 'NA' values as missing data so that datastore replaces them with NaN values. Create a tall table that contains the data in the datastore. ds = datastore('airlinesmall.csv'); ds.SelectedVariableNames = {'Month','DayofMonth','DayOfWeek',... 'DepTime','ArrDelay','Distance','DepDelay'}; ds.TreatAsMissing = 'NA'; tt = tall(ds) % Tall table tt = M×7 tall table Month _____ 10 10 10 10 10 10 10 10 : :

DayofMonth __________

DayOfWeek _________

21 26 23 23 22 28 8 10 : :

3 1 5 5 4 3 4 6 : :

DepTime _______ 642 1021 2055 1332 629 1446 928 859 : :

ArrDelay ________ 8 8 21 13 4 59 3 11 : :

Distance ________ 308 296 480 296 373 308 447 954 : :

DepDelay ________ 12 1 20 12 -1 63 -2 -1 : :

Prepare Class Labels and Predictor Data Determine the flights that are late by 10 minutes or more by defining a logical variable that is true for a late flight. This variable contains the class labels. A preview of this variable includes the first few rows. Y = tt.DepDelay > 10 % Class labels Y =

29-9

29

Tall Arrays

M×1 tall logical array 1 0 1 1 0 1 0 0 : :

Create a tall array for the predictor data. X = tt{:,1:end-1} % Predictor data X = M×6 tall double matrix 10 10 10 10 10 10 10 10 : :

21 26 23 23 22 28 8 10 : :

3 1 5 5 4 3 4 6 : :

642 1021 2055 1332 629 1446 928 859 : :

8 8 21 13 4 59 3 11 : :

308 296 480 296 373 308 447 954 : :

Remove rows in X and Y that contain missing data. R = rmmissing([X Y]); % Data with missing entries removed X = R(:,1:end-1); Y = R(:,end);

Perform Bayesian Optimization Using OptimizeHyperparameters Optimize hyperparameters automatically using the 'OptimizeHyperparameters' name-value pair argument. Standardize the predictor variables. Z = zscore(X);

Find the optimal values for the 'KernelScale' and 'Lambda' name-value pair arguments that minimize five-fold cross-validation loss. For reproducibility, use the 'expected-improvementplus' acquisition function and set the seeds of the random number generators using rng and tallrng. The results can vary depending on the number of workers and the execution environment for the tall arrays. For details, see “Control Where Your Code Runs” (MATLAB).

rng('default') tallrng('default') Mdl = fitckernel(Z,Y,'Verbose',0,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName','expected-improvement-pl

29-10

Bayesian Optimization with Tall Arrays

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 2: Completed in 0.76 sec - Pass 2 of 2: Completed in 0.89 sec Evaluation completed in 2.1 sec

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.99 sec Evaluation completed in 1.1 sec |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | KernelScale | La | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.19672 | 108.54 | 0.19672 | 0.19672 | 1.2297 | 0.008 Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.88 sec Evaluation completed in 1 sec | 2 | Accept | 0.19672 | 48.301 | 0.19672 |

0.19672 |

0.039643 |

2.5756

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.95 sec | 3 | Accept | 0.19672 | 47.758 | 0.19672 |

0.19672 |

0.02562 |

1.2555

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.95 sec | 4 | Accept | 0.19672 | 53.626 | 0.19672 |

0.19672 |

92.644 |

1.2056

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.9 sec Evaluation completed in 1 sec | 5 | Best | 0.11469 | 83.486 | 0.11469 |

0.12698 |

11.173 |

0.0002

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.79 sec Evaluation completed in 0.91 sec | 6 | Best | 0.11365 | 77.926 | 0.11365 |

0.11373 |

10.609 |

0.0002

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 7 | Accept | 0.19672 | 47.829 | 0.11365 |

0.11373 |

0.0059498 |

0.0004

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.81 sec Evaluation completed in 0.92 sec | 8 | Accept | 0.12122 | 86.316 | 0.11365 |

0.11371 |

11.44 |

0.0004

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.81 sec Evaluation completed in 0.93 sec | 9 | Best | 0.10417 | 40.776 | 0.10417 |

0.10417 |

8.0424 |

6.7998

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.96 sec | 10 | Accept | 0.10433 | 41.187 | 0.10417 |

0.10417 |

9.6694 |

1.4948

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.81 sec

29-11

29

Tall Arrays

Evaluation completed in 0.93 sec | 11 | Best | 0.10409 |

41.285 |

0.10409 |

0.10411 |

6.2099 |

6.1093

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.92 sec | 12 | Best | 0.10383 | 43.397 | 0.10383 |

0.10404 |

5.6767 |

7.6134

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.79 sec Evaluation completed in 0.91 sec | 13 | Accept | 0.10408 | 43.496 | 0.10383 |

0.10365 |

8.1769 |

8.5993

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.93 sec | 14 | Accept | 0.10404 | 40.596 | 0.10383 |

0.10361 |

7.6191 |

6.4079

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.81 sec Evaluation completed in 0.92 sec | 15 | Best | 0.10351 | 40.494 | 0.10351 |

0.10362 |

4.2987 |

9.2645

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.92 sec | 16 | Accept | 0.10404 | 43.342 | 0.10351 |

0.10362 |

4.8747 |

1.7838

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.97 sec | 17 | Accept | 0.10657 | 85.643 | 0.10351 |

0.10357 |

4.8239 |

0.0001

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.91 sec | 18 | Best | 0.10299 | 40.505 | 0.10299 |

0.10358 |

3.5555 |

2.7165

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.81 sec Evaluation completed in 0.93 sec | 19 | Accept | 0.10366 | 40.458 | 0.10299 |

0.10324 |

3.8035 |

1.3542

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.92 sec | 20 | Accept | 0.10337 | 40.697 | 0.10299 |

0.10323 |

3.806 |

1.8101

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.82 sec Evaluation completed in 0.94 sec |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | KernelScale | La | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.10345 | 40.482 | 0.10299 | 0.10322 | 3.3655 | 9.082 Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.8 sec

29-12

Bayesian Optimization with Tall Arrays

Evaluation completed in 0.92 sec | 22 | Accept | 0.19672 |

59.828 |

0.10299 |

0.10322 |

999.62 |

1.2609

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 0.97 sec | 23 | Accept | 0.10315 | 40.397 | 0.10299 |

0.10306 |

3.6716 |

1.2445

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.3 sec Evaluation completed in 1.4 sec | 24 | Accept | 0.19672 | 48.555 | 0.10299 |

0.10306 |

0.0010004 |

2.6214

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.82 sec Evaluation completed in 0.94 sec | 25 | Accept | 0.19672 | 49.25 | 0.10299 |

0.10306 |

0.21865 |

0.002

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.81 sec Evaluation completed in 0.93 sec | 26 | Accept | 0.19672 | 60.201 | 0.10299 |

0.10306 |

299.92 |

0.003

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.79 sec Evaluation completed in 0.9 sec | 27 | Accept | 0.19672 | 47.556 | 0.10299 |

0.10306 |

0.002436 |

0.004

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.89 sec Evaluation completed in 1 sec | 28 | Accept | 0.19672 | 50.991 | 0.10299 |

0.10305 |

0.50559 |

3.3667

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.94 sec | 29 | Accept | 0.10354 | 41.183 | 0.10299 |

0.10313 |

3.7754 |

9.5626

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.97 sec | 30 | Accept | 0.10405 | 40.938 | 0.10299 |

0.10315 |

8.9864 |

2.3136

29-13

29

Tall Arrays

29-14

Bayesian Optimization with Tall Arrays

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 1596.4676 seconds. Total objective function evaluation time: 1575.037 Best observed feasible point: KernelScale Lambda ___________ __________ 3.5555

2.7165e-06

Observed objective function value = 0.10299

29-15

29

Tall Arrays

Estimated objective function value = 0.10332 Function evaluation time = 40.5048 Best estimated feasible point (according to models): KernelScale Lambda ___________ __________ 3.6716

1.2445e-08

Estimated objective function value = 0.10315 Estimated function evaluation time = 40.9846 Mdl = ClassificationKernel PredictorNames: ResponseName: ClassNames: Learner: NumExpansionDimensions: KernelScale: Lambda: BoxConstraint:

{'x1' 'x2' 'Y' [0 1] 'svm' 256 3.6716 1.2445e-08 665.9442

'x3'

'x4'

'x5'

'x6'}

Properties, Methods

Perform Bayesian Optimization by Using bayesopt Alternatively, you can use the bayesopt function to find the optimal values of hyperparameters. Split the data set into training and test sets. Specify a 1/3 holdout sample for the test set. rng('default') % For reproducibility tallrng('default') % For reproducibility Partition = cvpartition(Y,'Holdout',1/3); trainingInds = training(Partition); % Indices for the training set testInds = test(Partition); % Indices for the test set

Extract training and testing data and standardize the predictor data. Ytrain = Y(trainingInds); % Training class labels Xtrain = X(trainingInds,:); [Ztrain,mu,stddev] = zscore(Xtrain); % Standardized training data Ytest = Y(testInds); % Testing class labels Xtest = X(testInds,:); Ztest = (Xtest-mu)./stddev; % Standardized test data

Define the variables sigma and lambda to find the optimal values for the 'KernelScale' and 'Lambda' name-value pair arguments. Use optimizableVariable and specify a wide range for the variables because optimal values are unknown. Apply logarithmic transformation to the variables to search for the optimal values on a log scale. N = gather(numel(Ytrain)); % Evaluate the length of the tall training array in memory Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 1 sec

29-16

Bayesian Optimization with Tall Arrays

sigma = optimizableVariable('sigma',[1e-3,1e3],'Transform','log'); lambda = optimizableVariable('lambda',[(1e-3)/N, (1e3)/N],'Transform','log');

Create the objective function for Bayesian optimization. The objective function takes in a table that contains the variables sigma and lambda, and then computes the classification loss value for the binary Gaussian kernel classification model trained using the fitckernel function. Set 'Verbose',0 within fitckernel to suppress the iterative display of diagnostic information. minfn = @(z)gather(loss(fitckernel(Ztrain,Ytrain, ... 'KernelScale',z.sigma,'Lambda',z.lambda,'Verbose',0), ... Ztest,Ytest));

Optimize the parameters [sigma,lambda] of the kernel classification model with respect to the classification loss by using bayesopt. By default, bayesopt displays iterative information about the optimization at the command line. For reproducibility, set the AcquisitionFunctionName option to 'expected-improvement-plus'. The default acquisition function depends on run time and, therefore, can give varying results. results = bayesopt(minfn,[sigma,lambda],'AcquisitionFunctionName','expected-improvement-plus')

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | sigma | la | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.19651 | 83.058 | 0.19651 | 0.19651 | 1.2297 | 0.01 Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec | 2 | Accept | 0.19651 | 110.79 | 0.19651 |

0.19651 |

0.039643 |

3.8633

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.2 sec Evaluation completed in 1.3 sec | 3 | Accept | 0.19651 | 79.818 | 0.19651 |

0.19651 |

0.02562 |

1.8832

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 4 | Accept | 0.19651 | 51.801 | 0.19651 |

0.19651 |

92.644 |

1.8084

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec | 5 | Accept | 0.19651 | 51.765 | 0.19651 |

0.19651 |

978.95 |

0.0001

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 6 | Accept | 0.19651 | 89.56 | 0.19651 |

0.19651 |

0.0089609 |

0.005

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 7 | Accept | 0.19651 | 51.515 | 0.19651 |

0.19651 |

97.709 |

0.0001

29-17

29

Tall Arrays

29-18

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec | 8 | Accept | 0.19651 | 49.878 | 0.19651 |

0.19651 |

422.03 |

4.841

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 9 | Accept | 0.19651 | 77.645 | 0.19651 |

0.19651 |

0.0012826 |

5.5116

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec | 10 | Accept | 0.19651 | 93.166 | 0.19651 |

0.19651 |

0.031682 |

3.1742

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 11 | Best | 0.10029 | 34.959 | 0.10029 |

0.1003 |

3.9327 |

3.5022

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 12 | Accept | 0.10059 | 34.975 | 0.10029 |

0.1003 |

3.1844 |

7.385

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 13 | Accept | 0.10098 | 37.019 | 0.10029 |

0.10031 |

5.2372 |

4.9773

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 14 | Accept | 0.10133 | 37.468 | 0.10029 |

0.099825 |

4.2748 |

1.262

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 15 | Accept | 0.10141 | 39.708 | 0.10029 |

0.10059 |

3.3388 |

2.1269

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 16 | Accept | 0.12235 | 72.467 | 0.10029 |

0.10058 |

9.0019 |

0.001

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 17 | Accept | 0.10668 | 73.693 | 0.10029 |

0.10042 |

3.6288 |

0.0006

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 18 | Best | 0.10016 | 37.482 | 0.10016 |

0.10058 |

4.311 |

1.2975

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 19 | Accept | 0.10034 | 37.389 | 0.10016 |

0.10001 |

3.8228 |

8.1818

Bayesian Optimization with Tall Arrays

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 20 | Accept | 0.10123 | 37.345 | 0.10016 |

0.10004 |

6.3387 |

1.2575

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec |================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | sigma | la | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.10113 | 38.547 | 0.10016 | 0.099988 | 5.1223 | 1.2705 Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 22 | Accept | 0.10041 | 38.701 | 0.10016 |

0.10006 |

3.6363 |

4.4732

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 23 | Accept | 0.10061 | 35.066 | 0.10016 |

0.10019 |

3.7705 |

2.8022

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.1 sec | 24 | Accept | 0.10044 | 37.191 | 0.10016 |

0.10025 |

3.6538 |

3.4072

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 25 | Accept | 0.19651 | 108.42 | 0.10016 |

0.10026 |

0.24021 |

1.3156

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 26 | Best | 0.10016 | 37.264 | 0.10016 |

0.10024 |

3.5161 |

4.6627

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 27 | Accept | 0.16207 | 31.762 | 0.10016 |

0.10024 |

28.573 |

1.356

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 28 | Accept | 0.10036 | 37.74 | 0.10016 |

0.10025 |

3.5285 |

7.3662

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 29 | Accept | 0.19651 | 84.085 | 0.10016 |

0.10025 |

0.0038154 |

1.3372

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec | 30 | Accept | 0.19651 | 86.677 | 0.10016 |

0.10024 |

0.12353 |

0.01

29-19

29

Tall Arrays

29-20

Bayesian Optimization with Tall Arrays

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 1821.1885 seconds. Total objective function evaluation time: 1716.9623 Best observed feasible point: sigma lambda ______ __________ 3.5161

4.6627e-07

Observed objective function value = 0.10016 Estimated objective function value = 0.10027 Function evaluation time = 37.2644 Best estimated feasible point (according to models): sigma lambda ______ __________ 3.6538

3.4072e-07

Estimated objective function value = 0.10024 Estimated function evaluation time = 36.8542 results = BayesianOptimization with properties:

29-21

29

Tall Arrays

ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace: FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

@(z)gather(loss(fitckernel(Ztrain,Ytrain,'KernelScale',z.sigm [1×2 optimizableVariable] [1×1 struct] 0.1002 [1×2 table] 0.1002 [1×2 table] 30 1.8212e+03 [1×2 table] [30×2 table] [30×1 double] [] {30×1 cell} [30×1 double] [30×1 double] [30×1 double] [30×1 logical] [30×1 double] [30×1 double] [30×1 double] [30×1 double]

Return the best feasible point in the Bayesian model results by using the bestPoint function. Use the default criterion min-visited-upper-confidence-interval, which determines the best feasible point as the visited point that minimizes an upper confidence interval on the objective function value. zbest = bestPoint(results) zbest=1×2 table sigma lambda ______ __________ 3.6538

3.4072e-07

The table zbest contains the optimal estimated values for the 'KernelScale' and 'Lambda' name-value pair arguments. You can specify these values when training a new optimized kernel classifier by using Mdl = fitckernel(Ztrain,Ytrain,'KernelScale',zbest.sigma,'Lambda',zbest.lambda)

For tall arrays, the optimization procedure can take a long time. If the data set is too large to run the optimization procedure, you can try to optimize the parameters by using only partial data. Use the datasample function and specify 'Replace','false' to sample data without replacement.

See Also bayesopt | bestPoint | cvpartition | datastore | fitckernel | gather | loss | nanmean | nanstd | optimizableVariable | tall

29-22

30 Parallel Statistics • “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 • “Use Parallel Processing for Regression TreeBagger Workflow” on page 30-6 • “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 • “When to Run Statistical Functions in Parallel” on page 30-9 • “Working with parfor” on page 30-11 • “Reproducibility in Parallel Statistical Computations” on page 30-13 • “Implement Jackknife Using Parallel Computing” on page 30-17 • “Implement Cross-Validation Using Parallel Computing” on page 30-18 • “Implement Bootstrap Using Parallel Computing” on page 30-20

30

Parallel Statistics

Quick Start Parallel Computing for Statistics and Machine Learning Toolbox Note To use parallel computing as described in this chapter, you must have a Parallel Computing Toolbox license. In this section... “What Is Parallel Statistics Functionality?” on page 30-2 “How To Compute in Parallel” on page 30-4

What Is Parallel Statistics Functionality? You can use any of the Statistics and Machine Learning Toolbox functions with Parallel Computing Toolbox constructs such as parfor and spmd. However, some functions, such as those with interactive displays, can lose functionality in parallel. In particular, displays and interactive usage are not effective on workers (see “Vocabulary for Parallel Computation” on page 30-8). Additionally, the following functions are enhanced to use parallel computing internally. These functions use parfor internally to parallelize calculations. • bayesopt • bootci • bootstrp • candexch • compare of LinearMixedModel • cordexch • crossval • daugment • dcovary • jackknife • kmeans • kmedoids • lasso • lassoglm • nnmf • oobPermutedPredictorImportance of ClassificationBaggedEnsemble • oobPermutedPredictorImportance of RegressionBaggedEnsemble • perfcurve • plotPartialDependence • plsregress • rowexch 30-2

Quick Start Parallel Computing for Statistics and Machine Learning Toolbox

• sequentialfs • testckfold • TreeBagger • growTrees of TreeBagger The following functions for fitting multiclass models for support vector machines and other classifiers are also enhanced to use parallel computing internally. • fitcecoc • Methods of the class ClassificationECOC: • resubEdge • resubLoss • resubMargin • resubPredict • crossval • Methods of the class CompactClassificationECOC • edge • loss • margin • predict • Methods of the class ClassificationPartitionedECOC • kfoldEdge • kfoldLoss • kfoldMargin • kfoldPredict • Methods of the class ClassificationPartitionedLinearECOC • kfoldEdge • kfoldLoss • kfoldMargin • kfoldPredict The following functions perform hyperparameter optimization in parallel. • fitcdiscr • fitcecoc • fitctree • fitclinear • fitcknn • fitrgp • fitrtree • fitcensemble 30-3

30

Parallel Statistics

• fitcnb • fitcsvm • fitrensemble • fitrlinear • fitrsvm • fitrkernel • fitckernel This chapter gives the simplest way to use these enhanced functions in parallel. For more advanced topics, including the issues of reproducibility and nested parfor loops, see the other sections in this chapter. For information on parallel statistical computing at the command line, enter help parallelstats

How To Compute in Parallel To have a function compute in parallel: 1. “Set Up a Parallel Environment” on page 30-4 2. “Set the UseParallel Option to true” on page 30-4 3. “Call the Function Using the Options Structure” on page 30-4 Set Up a Parallel Environment To run a statistical computation in parallel, first set up a parallel environment. Note Setting up a parallel environment can take several seconds. For a multicore machine, enter the following at the MATLAB command line: parpool(n)

n is the number of workers you want to use. Set the UseParallel Option to true Create an options structure with the statset function. To run in parallel, set the UseParallel option to true: paroptions = statset('UseParallel',true);

Call the Function Using the Options Structure Call your function with syntax that uses the options structure. For example: % Run crossval in parallel cvMse = crossval('mse',x,y,'predfun',regf,'Options',paroptions); % Run bootstrp in parallel

30-4

Quick Start Parallel Computing for Statistics and Machine Learning Toolbox

sts = bootstrp(100,@(x)[mean(x) std(x)],y,'Options',paroptions); % Run TreeBagger in parallel b = TreeBagger(50,meas,spec,'OOBPred','on','Options',paroptions);

For more complete examples of parallel statistical functions, see “Use Parallel Processing for Regression TreeBagger Workflow” on page 30-6, “Implement Jackknife Using Parallel Computing” on page 30-17, “Implement Cross-Validation Using Parallel Computing” on page 30-18, and “Implement Bootstrap Using Parallel Computing” on page 30-20. After you have finished computing in parallel, close the parallel environment: delete mypool

Tip To save time, keep the pool open if you expect to compute in parallel again soon.

30-5

30

Parallel Statistics

Use Parallel Processing for Regression TreeBagger Workflow This example shows you how to: • Use an ensemble of bagged regression trees to estimate feature importance. • Improve computation speed by using parallel computing. The sample data is a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric and the last 10 are categorical. The symboling index takes integer values from -3 to 3. Load the sample data and separate it into predictor and response arrays. load imports-85; Y = X(:,1); X = X(:,2:end);

Set up the parallel environment to use two cores. mypool = parpool(2) Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 2). mypool = LocalProcessesPool with properties: Connected: NumWorkers: Cluster: AttachedFiles: AutoAddClientPath: IdleTimeout: SpmdEnabled:

true 2 local {} true 30 minutes (30 minutes remaining) true

Set the options to use parallel processing. paroptions = statset('UseParallel',true);

Estimate feature importance using leaf size 1 and 5000 trees in parallel. Time the function for comparison purposes. tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ... 'cat',16:25,'MinLeafSize',1,'Options',paroptions); toc Elapsed time is 30.297706 seconds.

Perform the same computation in serial for timing comparison. tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on', ... 'cat',16:25,'MinLeafSize',1); toc Elapsed time is 32.935544 seconds.

30-6

Use Parallel Processing for Regression TreeBagger Workflow

The results show that computing in parallel takes a fraction of the time it takes to compute serially. Note that the elapsed time can vary depending on your operating system.

See Also TreeBagger | parpool | statset

Related Examples •

“Bootstrap Aggregation (Bagging) of Regression Trees Using TreeBagger” on page 18-108



“Bootstrap Aggregation (Bagging) of Classification Trees Using TreeBagger” on page 18-119



“Comparison of TreeBagger and Bagged Ensembles” on page 18-43

30-7

30

Parallel Statistics

Concepts of Parallel Computing in Statistics and Machine Learning Toolbox In this section... “Subtleties in Parallel Computing” on page 30-8 “Vocabulary for Parallel Computation” on page 30-8

Subtleties in Parallel Computing There are two main subtleties in parallel computations: • Nested parallel evaluations (see “No Nested parfor Loops” on page 30-11). Only the outermost parfor loop runs in parallel, the others run serially. • Reproducible results when using random numbers (see “Reproducibility in Parallel Statistical Computations” on page 30-13). How can you get exactly the same results when repeatedly running a parallel computation that uses random numbers?

Vocabulary for Parallel Computation • worker — An independent MATLAB session that runs code distributed by the client. • client — The MATLAB session with which you interact, and that distributes jobs to workers. • parfor — A Parallel Computing Toolbox function that distributes independent code segments to workers (see “Working with parfor” on page 30-11). • random stream — A pseudorandom number generator, and the sequence of values it generates. MATLAB implements random streams with the RandStream class. • reproducible computation — A computation that can be exactly replicated, even in the presence of random numbers (see “Reproducibility in Parallel Statistical Computations” on page 30-13).

30-8

When to Run Statistical Functions in Parallel

When to Run Statistical Functions in Parallel In this section... “Why Run in Parallel?” on page 30-9 “Factors Affecting Speed” on page 30-9 “Factors Affecting Results” on page 30-9

Why Run in Parallel? The main reason to run statistical computations in parallel is to gain speed, meaning to reduce the execution time of your program or functions. “Factors Affecting Speed” on page 30-9 discusses the main items affecting the speed of programs or functions. “Factors Affecting Results” on page 30-9 discusses details that can cause a parallel run to give different results than a serial run. Note Some Statistics and Machine Learning Toolbox functions have built-in parallel computing capabilities. See Quick Start Parallel Computing for Statistics and Machine Learning Toolbox on page 30-2. You can also use any Statistics and Machine Learning Toolbox functions with Parallel Computing Toolbox functions such as parfor loops. To decide when to call functions in parallel, consider the factors affecting speed and results.

Factors Affecting Speed Some factors that can affect the speed of execution of parallel processing are: • Parallel environment setup. It takes time to run parpool to begin computing in parallel. If your computation is fast, the setup time can exceed any time saved by computing in parallel. • Parallel overhead. There is overhead in communication and coordination when running in parallel. If function evaluations are fast, this overhead could be an appreciable part of the total computation time. Thus, solving a problem in parallel can be slower than solving the problem serially. For an example, see Improving Optimization Performance with Parallel Computing in MATLAB Digest, March 2009. • No nested parfor loops. This is described in “Working with parfor” on page 30-11. parfor does not work in parallel when called from within another parfor loop. If you have programmed your custom functions to take advantage of parallel processing, the limitation of no nested parfor loops can cause a parallel function to run slower than expected. • When executing serially, parfor loops run slightly slower than for loops. • Passing parameters. Parameters are automatically passed to worker sessions during the execution of parallel computations. If there are many parameters, or they take a large amount of memory, passing parameters can slow the execution of your computation. • Contention for resources: network and computing. If the pool of workers has low bandwidth or high latency, parallel computation can be slow.

Factors Affecting Results Some factors can affect results when using parallel processing. You might need to adjust your code to run in parallel, for example, you need independent loops and the workers must be able to access the variables. Some important factors are: 30-9

30

Parallel Statistics

• Persistent or global variables. If any functions use persistent or global variables, these variables can take different values on different worker processors. The body of a parfor loop cannot contain global or persistent variable declarations. • Accessing external files. The order of computations is not guaranteed during parallel processing, so external files can be accessed in unpredictable order, leading to unpredictable results. Furthermore, if multiple processors try to read an external file simultaneously, the file can become locked, leading to a read error, and halting function execution. • Noncomputational functions, such as input, plot, and keyboard, can behave badly when used in your custom functions. Do not use these functions in a parfor loop, because they can cause a worker to become nonresponsive, since it is waiting for input. • parfor does not allow break or return statements. • The random numbers you use can affect the results of your computations. See “Reproducibility in Parallel Statistical Computations” on page 30-13. For advice on converting for loops to use parfor, see “Parallel for-Loops (parfor)” (Parallel Computing Toolbox).

30-10

Working with parfor

Working with parfor In this section... “How Statistical Functions Use parfor” on page 30-11 “Characteristics of parfor” on page 30-11

How Statistical Functions Use parfor parfor is a Parallel Computing Toolbox function similar to a for loop. Parallel statistical functions call parfor internally. parfor distributes computations to worker processors.

Characteristics of parfor You might need to adjust your code to run in parallel, for example, you need independent loops and the workers must be able to access the variables. For advice on using parfor, see “Parallel for-Loops (parfor)” (Parallel Computing Toolbox). No Nested parfor Loops parfor does not work in parallel when called from within another parfor loop, or from an spmd block. Parallelization occurs only at the outermost level. 30-11

30

Parallel Statistics

Suppose, for example, you want to apply jackknife to your function userfcn, which calls parfor, and you want to call jackknife in a loop. The following figure shows three cases: 1

The outermost loop is parfor. Only that loop runs in parallel.

2

The outermost parfor loop is in jackknife. Only jackknife runs in parallel.

3

The outermost parfor loop is in userfcn. userfcn uses parfor in parallel.

When parfor Runs in Parallel For help converting nested loops to use parfor, see “Convert for-Loops Into parfor-Loops” (Parallel Computing Toolbox). See also Quick Start Parallel Computing for Statistics and Machine Learning Toolbox on page 30-2.

30-12

Reproducibility in Parallel Statistical Computations

Reproducibility in Parallel Statistical Computations In this section... “Issues and Considerations in Reproducing Parallel Computations” on page 30-13 “Running Reproducible Parallel Computations” on page 30-13 “Parallel Statistical Computation Using Random Numbers” on page 30-14

Issues and Considerations in Reproducing Parallel Computations A reproducible computation is one that gives the same results every time it runs. Reproducibility is important for: • Debugging — To correct an anomalous result, you need to reproduce the result. • Confidence — When you can reproduce results, you can investigate and understand them. • Modifying existing code — When you change existing code, you want to ensure that you do not break anything. Generally, you do not need to ensure reproducibility for your computation. Often, when you want reproducibility, the simplest technique is to run in serial instead of in parallel. In serial computation you can simply call the rng function as follows: s = rng % Obtain the current state of the random stream % run the statistical function rng(s) % Reset the stream to the previous state % run the statistical function again, obtain identical results

This section addresses the case when your function uses random numbers, and you want reproducible results in parallel. This section also addresses the case when you want the same results in parallel as in serial.

Running Reproducible Parallel Computations To run a Statistics and Machine Learning Toolbox function reproducibly: 1

Set the UseSubstreams option to true.

2

Set the Streams option to a type that supports substreams: 'mlfg6331_64' or 'mrg32k3a'. For information on these streams, see “Choosing a Random Number Generator” (MATLAB).

3

To compute in parallel, set the UseParallel option to true.

4

Call the function with the options structure.

5

To reproduce the computation, reset the stream, then call the function again.

To understand why this technique gives reproducibility, see “How Substreams Enable Reproducible Parallel Computations” on page 30-14. For example, to use the 'mlfg6331_64' stream for reproducible computation: 1

Create an appropriate options structure: s = RandStream('mlfg6331_64'); options = statset('UseParallel',true, ... 'Streams',s,'UseSubstreams',true);

30-13

30

Parallel Statistics

2 3

Run your parallel computation. For instructions, see Quick Start Parallel Computing for Statistics and Machine Learning Toolbox on page 30-2. Reset the random stream: reset(s);

4

Rerun your parallel computation. You obtain identical results.

For an example of a parallel computation run this reproducible way, see “Reproducible Parallel Bootstrap” on page 30-21.

Parallel Statistical Computation Using Random Numbers What Are Substreams? A substream is a portion of a random stream that RandStream can access quickly. There is a number M such that for any positive integer k, RandStream can go to the kMth pseudorandom number in the stream. From that point, RandStream can generate the subsequent entries in the stream. Currently, RandStream has M = 272, about 5e21, or more.

The entries in different substreams have good statistical properties, similar to the properties of entries in a single stream: independence, and lack of k-way correlation at various lags. The substreams are so long that you can view the substreams as being independent streams, as in the following picture.

Two RandStream stream types support substreams: 'mlfg6331_64' and 'mrg32k3a'. How Substreams Enable Reproducible Parallel Computations When MATLAB performs computations in parallel with parfor, each worker receives loop iterations in an unpredictable order. Therefore, you cannot predict which worker gets which iteration, so cannot determine the random numbers associated with each iteration. Substreams allow MATLAB to tie each iteration to a particular sequence of random numbers. parfor gives each iteration an index. The iteration uses the index as the substream number. Since the random numbers are associated with the iterations, not with the workers, the entire computation is reproducible. 30-14

Reproducibility in Parallel Statistical Computations

To obtain reproducible results, simply reset the stream, and all the substreams generate identical random numbers when called again. This method succeeds when all the workers use the same stream, and the stream supports substreams. This concludes the discussion of how the procedure in “Running Reproducible Parallel Computations” on page 30-13 gives reproducible parallel results. Random Numbers on the Client or Workers A few functions generate random numbers on the client before distributing them to parallel workers. The workers do not use random numbers, so operate purely deterministically. For these functions, you can run a parallel computation reproducibly using any random stream type. The functions that operate this way include: • crossval • plsregress • sequentialfs To obtain identical results, reset the random stream on the client, or the random stream you pass to the client. For example: s = rng % Obtain the current state of the random stream % run the statistical function rng(s) % Reset the stream to the previous state % run the statistical function again, obtain identical results

While this method enables you to run reproducibly in parallel, the results can differ from a serial computation. The reason for the difference is parfor loops run in reverse order from for loops. Therefore, a serial computation can generate random numbers in a different order than a parallel computation. For unequivocal reproducibility, use the technique in “Running Reproducible Parallel Computations” on page 30-13. Distributing Streams Explicitly For testing or comparison using particular random number algorithms, you must set the random number generators. How do you set these generators in parallel, or initialize streams on each worker in a particular way? Or you might want to run a computation using a different sequence of random numbers than any other you have run. How can you ensure the sequence you use is statistically independent? Parallel Statistics and Machine Learning Toolbox functions allow you to set random streams on each worker explicitly. For information on creating multiple streams, enter help RandStream/create at the command line. To create four independent streams using the 'mrg32k3a' generator: s = RandStream.create('mrg32k3a','NumStreams',4,... 'CellOutput',true);

Pass these streams to a statistical function using the Streams option. For example: parpool(4) % if you have at least 4 cores s = RandStream.create('mrg32k3a','NumStreams',4,... 'CellOutput',true); % create 4 independent streams paroptions = statset('UseParallel',true,... 'Streams',s); % set the 4 different streams x = [randn(700,1); 4 + 2*randn(300,1)]; latt = -4:0.01:12;

30-15

30

Parallel Statistics

myfun = @(X) ksdensity(X,latt); pdfestimate = myfun(x); B = bootstrp(200,myfun,x,'Options',paroptions);

This method of distributing streams gives each worker a different stream for the computation. However, it does not allow for a reproducible computation, because the workers perform the 200 bootstraps in an unpredictable order. If you want to perform a reproducible computation, use substreams as described in “Running Reproducible Parallel Computations” on page 30-13. If you set the UseSubstreams option to true, then set the Streams option to a single random stream of the type that supports substreams ('mlfg6331_64' or 'mrg32k3a'). This setting gives reproducible computations.

30-16

Implement Jackknife Using Parallel Computing

Implement Jackknife Using Parallel Computing This example is from the jackknife function reference page, but runs in parallel. Generate a sample data of size 10000 from a normal distribution with mean 0 and standard deviation 5. sigma = 5; rng('default') y = normrnd(0,sigma,10000,1);

Run jackknife in parallel to estimate the variance. To do this, use statset to create the options structure and set the UseParallel field to true. opts = statset('UseParallel',true); m = jackknife(@var,y,1,'Options',opts);

Compare the known bias formula with the jackknife bias estimate. n = length(y); bias = -sigma^2/n % Known bias formula jbias = (n-1)*(mean(m)-var(y,1)) % jackknife bias estimate Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). bias = -0.0025 jbias = -0.0025

Compare how long it takes to compute in serial and in parallel. tic;m = jackknife(@var,y,1);toc

% Serial computation

Elapsed time is 1.638026 seconds. tic;m = jackknife(@var,y,1,'Options',opts);toc % Parallel computation Elapsed time is 0.507961 seconds.

jackknife does not use random numbers, so gives the same results every time, whether run in parallel or serial.

30-17

30

Parallel Statistics

Implement Cross-Validation Using Parallel Computing In this section... “Simple Parallel Cross Validation” on page 30-18 “Reproducible Parallel Cross Validation” on page 30-18

Simple Parallel Cross Validation In this example, use crossval to compute a cross-validation estimate of mean-squared error for a regression model. Run the computations in parallel. mypool = parpool() Starting parpool using the 'local' profile ... connected to 2 workers. mypool = Pool with properties: AttachedFiles: NumWorkers: IdleTimeout: Cluster: RequestQueue: SpmdEnabled:

{0x1 cell} 2 30 [1x1 parallel.cluster.Local] [1x1 parallel.RequestQueue] 1

opts = statset('UseParallel',true); load('fisheriris'); y = meas(:,1); X = [ones(size(y,1),1),meas(:,2:4)]; regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN)); cvMse = crossval('mse',X,y,'Predfun',regf,'Options',opts) cvMse = 0.1028

This simple example is not a good candidate for parallel computation: % How long to compute in serial? tic;cvMse = crossval('mse',X,y,'Predfun',regf);toc Elapsed time is 0.073438 seconds. % How long to compute in parallel? tic;cvMse = crossval('mse',X,y,'Predfun',regf,... 'Options',opts);toc Elapsed time is 0.289585 seconds.

Reproducible Parallel Cross Validation To run crossval in parallel in a reproducible fashion, set the options and reset the random stream appropriately (see “Running Reproducible Parallel Computations” on page 30-13). 30-18

Implement Cross-Validation Using Parallel Computing

mypool = parpool() Starting parpool using the 'local' profile ... connected to 2 workers. mypool = Pool with properties: AttachedFiles: NumWorkers: IdleTimeout: Cluster: RequestQueue: SpmdEnabled:

{0x1 cell} 2 30 [1x1 parallel.cluster.Local] [1x1 parallel.RequestQueue] 1

s = RandStream('mlfg6331_64'); opts = statset('UseParallel',true,... 'Streams',s,'UseSubstreams',true); load('fisheriris'); y = meas(:,1); X = [ones(size(y,1),1),meas(:,2:4)]; regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN)); cvMse = crossval('mse',X,y,'Predfun',regf,'Options',opts) cvMse = 0.1020

Reset the stream: reset(s) cvMse = crossval('mse',X,y,'Predfun',regf,'Options',opts) cvMse = 0.1020

30-19

30

Parallel Statistics

Implement Bootstrap Using Parallel Computing In this section... “Bootstrap in Serial and Parallel” on page 30-20 “Reproducible Parallel Bootstrap” on page 30-21

Bootstrap in Serial and Parallel Here is an example timing a bootstrap in parallel versus in serial. The example generates data from a mixture of two Gaussians, constructs a nonparametric estimate of the resulting data, and uses a bootstrap to get a sense of the sampling variability. 1

Generate the data: % Generate a random sample of size 1000, % from a mixture of two Gaussian distributions x = [randn(700,1); 4 + 2*randn(300,1)];

2

Construct a nonparametric estimate of the density from the data: latt = -4:0.01:12; myfun = @(X) ksdensity(X,latt); pdfestimate = myfun(x);

3

Bootstrap the estimate to get a sense of its sampling variability. Run the bootstrap in serial for timing comparison. tic;B = bootstrp(200,myfun,x);toc Elapsed time is 10.878654 seconds.

4

Run the bootstrap in parallel for timing comparison: mypool = parpool() Starting parpool using the 'local' profile ... connected to 2 workers. mypool = Pool with properties: AttachedFiles: NumWorkers: IdleTimeout: Cluster: RequestQueue: SpmdEnabled:

{0x1 cell} 2 30 [1x1 parallel.cluster.Local] [1x1 parallel.RequestQueue] 1

opt = statset('UseParallel',true); tic;B = bootstrp(200,myfun,x,'Options',opt);toc Elapsed time is 6.304077 seconds.

Computing in parallel is nearly twice as fast as computing in serial for this example. Overlay the ksdensity density estimate with the 200 bootstrapped estimates obtained in the parallel bootstrap. You can get a sense of how to assess the accuracy of the density estimate from this plot. 30-20

Implement Bootstrap Using Parallel Computing

hold on for i=1:size(B,1), plot(latt,B(i,:),'c:') end plot(latt,pdfestimate); xlabel('x');ylabel('Density estimate')

Reproducible Parallel Bootstrap To run the example in parallel in a reproducible fashion, set the options appropriately (see “Running Reproducible Parallel Computations” on page 30-13). First set up the problem and parallel environment as in “Bootstrap in Serial and Parallel” on page 30-20. Then set the options to use substreams along with a stream that supports substreams. s = RandStream('mlfg6331_64'); % has substreams opts = statset('UseParallel',true,... 'Streams',s,'UseSubstreams',true); B2 = bootstrp(200,myfun,x,'Options',opts);

To rerun the bootstrap and get the same result: reset(s) % set the stream to initial state B3 = bootstrp(200,myfun,x,'Options',opts); isequal(B2,B3) % check if same results

30-21

30

Parallel Statistics

ans = 1

30-22

31 Code Generation • “Introduction to Code Generation” on page 31-2 • “General Code Generation Workflow” on page 31-5 • “Code Generation for Prediction of Machine Learning Model at Command Line” on page 31-9 • “Code Generation for Nearest Neighbor Searcher” on page 31-13 • “Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 31-16 • “Code Generation and Classification Learner App” on page 31-25 • “Predict Class Labels Using MATLAB Function Block” on page 31-34 • “Specify Variable-Size Arguments for Code Generation” on page 31-38 • “Train SVM Classifier with Categorical Predictors and Generate C/C++ Code” on page 31-43 • “System Objects for Classification and Code Generation” on page 31-47 • “Predict Class Labels Using Stateflow” on page 31-55 • “Human Activity Recognition Simulink Model for Smartphone Deployment” on page 31-59 • “Code Generation for Prediction and Update Using Coder Configurer” on page 31-68 • “Code Generation for Probability Distribution Objects” on page 31-70 • “Fixed-Point Code Generation for Prediction of SVM” on page 31-75 • “Generate Code to Classify Numeric Data in Table” on page 31-88

31

Code Generation

Introduction to Code Generation In this section... “Code Generation Workflows” on page 31-2 “Code Generation Applications” on page 31-4 MATLAB Coder generates readable and portable C and C++ code from Statistics and Machine Learning Toolbox functions that support code generation. You can integrate the generated code into your projects as source code, static libraries, or dynamic libraries. You can also use the generated code within the MATLAB environment to accelerate computationally intensive portions of your MATLAB code. Generating C/C++ code requires MATLAB Coder and has the following limitations: • You cannot call any function at the top level when generating code by using codegen. Instead, call the function within an entry-point function, and then generate code from the entry-point function. The entry-point function, also known as the top-level or primary function, is a function you define for code generation. All functions within the entry-point function must support code generation. • The MATLAB Coder limitations also apply to Statistics and Machine Learning Toolbox for code generation. For details, see “MATLAB Language Features Supported for C/C++ Code Generation” (MATLAB Coder). • Code generation in Statistics and Machine Learning Toolbox does not support sparse matrices and categorical arrays. Code generation supports numeric tables for most prediction functions. • For the code generation usage notes and limitations for each function, see the Code Generation section on the function reference page. For a list of Statistics and Machine Learning Toolbox functions that support code generation, see Function List (C/C++ Code Generation).

Code Generation Workflows You can generate C/C++ code for the Statistics and Machine Learning Toolbox functions in several ways. • General code generation workflow for functions that are not the object functions of machine learning models

Define an entry-point function that calls the function that supports code generation, generate C/C ++ code for the entry-point function by using codegen, and then verify the generated code. The entry-point function, also known as the top-level or primary function, is a function you define for code generation. Because you cannot call any function at the top level using codegen, you must define an entry-point function. All functions within the entry-point function must support code generation. For details, see “General Code Generation Workflow” on page 31-5. • Code generation workflow for the object function (predict, random, knnsearch, or rangesearch) of a machine learning model 31-2

Introduction to Code Generation

Save a trained model by using saveLearnerForCoder, and define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the object function. Then generate code for the entry-point function by using codegen, and verify the generated code. The input arguments of the entry-point function cannot be classification or regression model objects. Therefore, you need to work around this limitation by using saveLearnerForCoder and loadLearnerForCoder. For details, see these examples • “Code Generation for Prediction of Machine Learning Model at Command Line” on page 319 • “Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 31-16 • “Code Generation for Nearest Neighbor Searcher” on page 31-13 • “Code Generation and Classification Learner App” on page 31-25 • “Specify Variable-Size Arguments for Code Generation” on page 31-38 You can also generate fixed-point C/C++ code for the prediction of a support vector machine (SVM) model, a decision tree model, and an ensemble of decision trees for classification and regression. This type of code generation requires Fixed-Point Designer™.

Fixed-point code generation requires an additional step that defines the fixed-point data types of the variables required for prediction. Create a fixed-point data type structure by using the data type function generated by generateLearnerDataTypeFcn, and use the structure as an input argument of loadLearnerForCoder in an entry-point function. You can also optimize the fixedpoint data types before generating code. For details, see “Fixed-Point Code Generation for Prediction of SVM” on page 31-75. • Code generation workflow for the predict and update functions of a tree model, an SVM model, a linear model, or a multiclass error-correcting output codes (ECOC) classification model using SVM or linear binary learners

Create a coder configurer by using learnerCoderConfigurer, generate code by using generateCode, and then verify the generated code. You can configure code generation options and specify the coder attributes of the model parameters using object properties. After you retrain 31-3

31

Code Generation

the model with new data or settings, you can update model parameters in the generated C/C++ code without having to regenerate the code. This feature reduces the effort required to regenerate, redeploy, and reverify C/C++ code. For details, see “Code Generation for Prediction and Update Using Coder Configurer” on page 3168.

Code Generation Applications Code generation for the Statistics and Machine Learning Toolbox functions also works with other toolboxes such as Simulink®, System object™, and Stateflow®, as described in these examples: • “Predict Class Labels Using MATLAB Function Block” on page 31-34 • “System Objects for Classification and Code Generation” on page 31-47 • “Predict Class Labels Using Stateflow” on page 31-55 For more applications of code generation, see these examples: • “Code Generation for Image Classification” • “Human Activity Recognition Simulink Model for Smartphone Deployment” on page 31-59

See Also codegen | generateLearnerDataTypeFcn | learnerCoderConfigurer | loadLearnerForCoder | saveLearnerForCoder

Related Examples •

31-4

“Get Started with MATLAB Coder” (MATLAB Coder)

General Code Generation Workflow

General Code Generation Workflow The general code generation workflow for the Statistics and Machine Learning Toolbox functions that are not the object functions of machine learning models is the same as the workflow described in MATLAB Coder. For details, see “Get Started with MATLAB Coder” (MATLAB Coder). To learn how to generate code for the object functions of machine learning models, see “Introduction to Code Generation” on page 31-2. This example briefly explains the general code generation workflow as summarized in this flow chart:

Define Entry-Point Function An entry-point function, also known as the top-level or primary function, is a function you define for code generation. Because you cannot call any function at the top level using codegen, you must define an entry-point function that calls code-generation-enabled functions, and generate C/C++ code for the entry-point function by using codegen. All functions within the entry-point function must support code generation. Add the %#codegen compiler directive (or pragma) to the entry-point function after the function signature to indicate that you intend to generate code for the MATLAB algorithm. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that would cause errors during code generation. See “Check Code with the Code Analyzer” (MATLAB Coder). For example, to generate code that estimates the interquartile range of a data set using iqr, define this function. function r = iqrCodeGen(x) %#codegen %IQRCODEGEN Estimate interquartile range % iqrCodeGen returns the interquartile range of the data x, % a single- or double-precision vector. r = iqr(x); end

You can allow for optional input arguments by specifying varargin as an input argument. For details, see “Code Generation for Variable Length Argument Lists” (MATLAB Coder) and “Specify Variable-Size Arguments for Code Generation” on page 31-38.

Generate Code Set Up Compiler To generate C/C++ code, you must have access to a compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. To view and change the default C compiler, enter: mex -setup

For more details, see “Change Default Compiler” (MATLAB). 31-5

31

Code Generation

Generate Code Using codegen After setting up your compiler, generate code for the entry-point function by using codegen or the MATLAB Coder app. To learn how to generate code using the MATLAB Coder app, see “Generate MEX Functions by Using the MATLAB Coder App” (MATLAB Coder). To generate code at the command line, use codegen. Because C and C++ are statically typed languages, you must determine the properties of all variables in the entry-point function at compile time. Specify the data types and sizes of all inputs of the entry-point function when you call codegen by using the -args option. • To specify the data type and exact input array size, pass a MATLAB expression that represents the set of values with a certain data type and array size. For example, to specify that the generated code from iqrCodeGen.m must accept a double-precision numeric column vector with 100 elements, enter: testX = randn(100,1); codegen iqrCodeGen -args {testX} -report

The -report flag generates a code generation report. See “Code Generation Reports” (MATLAB Coder). • To specify that at least one of the dimensions can have any length, use the -args option with coder.typeof as follows. -args {coder.typeof(example_value, size_vector, variable_dims)}

The values of example_value, size_vector, and variable_dims specify the properties of the input array that the generated code can accept. • An input array has the same data type as the example values in example_value. • size_vector is the array size of an input array if the corresponding variable_dims value is false. • size_vector is the upper bound of the array size if the corresponding variable_dims value is true. • variable_dims specifies whether each dimension of the array has a variable size or a fixed size. A value of true (logical 1) means that the corresponding dimension has a variable size; a value of false (logical 0) means that the corresponding dimension has a fixed size. Specifying a variable-size input is convenient when you have data with an unknown number of observations at compile time. For example, to specify that the generated code from iqrCodeGen.m can accept a double-precision numeric column vector of any length, enter: testX = coder.typeof(0,[Inf,1],[1,0]); codegen iqrCodeGen -args {testX} -report

0 for the example_value value implies that the data type is double because double is the default numeric data type of MATLAB. [Inf,1] for the size_vector value and [1,0] for the variable_dims value imply that the size of the first dimension is variable and unbounded, and the size of the second dimension is fixed to be 1. • To specify a character array, such as supported name-value pair arguments, specify the character array as a constant using coder.Constant. For example, suppose that 'Name' is a valid namevalue pair argument for iqrCodeGen.m, and the corresponding value value is numeric. Then enter: 31-6

General Code Generation Workflow

codegen iqrCodeGen -args {testX,coder.Constant('Name'),value} -report

For more details, see “Generate C Code at the Command Line” (MATLAB Coder) and “Specify Properties of Entry-Point Function Inputs” (MATLAB Coder). Build Type MATLAB Coder can generate code for these types: • MEX (MATLAB Executable) function • Standalone C/C++ code • Standalone C/C++ code compiled to a static library • Standalone C/C++ code compiled to a dynamically linked library • Standalone C/C++ code compiled to an executable You can specify the build type using the -config option of codegen. For more details on setting code generation options, see “Configure Build Settings” (MATLAB Coder). By default, codegen generates a MEX function. A MEX function is a C/C++ program that is executable from MATLAB. You can use a MEX function to accelerate MATLAB algorithms and to test the generated code for functionality and run-time issues. For details, see “MATLAB Algorithm Acceleration” (MATLAB Coder) and “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Code Generation Report You can use the -report flag to produce a code generation report. This report helps you debug code generation issues and view the generated C/C++ code. For details, see “Code Generation Reports” (MATLAB Coder).

Verify Generated Code Test a MEX function to verify that the generated code provides the same functionality as the original MATLAB code. To perform this test, run the MEX function using the same inputs that you used to run the original MATLAB code, and then compare the results. Running the MEX function in MATLAB before generating standalone code also enables you to detect and fix run-time errors that are much harder to diagnose in the generated standalone code. For more details, see “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Pass some data to verify whether iqr, iqrCodeGen, and iqrCodeGen_mex return the same interquartile range. testX = randn(100,1); r = iqr(testX); r_entrypoint = iqrCodeGen(testX); r_mex = iqrCodeGen_mex(testX);

Compare the outputs by using isequal. isequal(r,r_entrypoint,r_mex)

isequal returns logical 1 (true) if all the inputs are equal. You can also verify the MEX function using a test file and coder.runTest. For details, see “Testing Code Generated from MATLAB Code” (MATLAB Coder). 31-7

31

Code Generation

See Also codegen

More About

31-8



“Introduction to Code Generation” on page 31-2



“Code Generation for Probability Distribution Objects” on page 31-70



Function List (C/C++ Code Generation)

Code Generation for Prediction of Machine Learning Model at Command Line

Code Generation for Prediction of Machine Learning Model at Command Line This example shows how to generate code for the prediction of classification and regression model objects at the command line. You can also generate code using the MATLAB® Coder™ app. See “Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 3116 for details. Certain classification and regression model objects have a predict or random function that supports code generation. Prediction using these object functions requires a trained classification or regression model object, but the -args option of codegen does not accept these objects. Work around this limitation by using saveLearnerForCoder and loadLearnerForCoder as described in this example. This flow chart shows the code generation workflow for the object functions of classification and regression model objects.

After you train a model, save the trained model by using saveLearnerForCoder. Define an entrypoint function that loads the saved model by using loadLearnerForCoder and calls the object function. Then generate code for the entry-point function by using codegen, and verify the generated code. Train Classification Model Train a classification model object equipped with a code-generation-enabled predict function. In this case, train a support vector machine (SVM) classification model. load fisheriris inds = ~strcmp(species,'setosa'); X = meas(inds,3:4); Y = species(inds); Mdl = fitcsvm(X,Y);

This step can include data preprocessing, feature selection, and optimizing the model using crossvalidation, for example. Save Model Using saveLearnerForCoder Save the classification model to the file SVMModel.mat by using saveLearnerForCoder. saveLearnerForCoder(Mdl,'SVMModel');

saveLearnerForCoder saves the classification model to the MATLAB binary file SVMModel.mat as a structure array in the current folder. Define Entry-Point Function An entry-point function, also known as the top-level or primary function, is a function you define for code generation. Because you cannot call any function at the top level using codegen, you must define an entry-point function that calls code-generation-enabled functions, and generate C/C++ code 31-9

31

Code Generation

for the entry-point function by using codegen. All functions within the entry-point function must support code generation. Define an entry-point function that returns predicted labels for input predictor data. Within the function, load the trained classification model by using loadLearnerForCoder, and then pass the loaded model to predict. In this case, define the predictLabelsSVM function, which predicts labels using the SVM model Mdl. type predictLabelsSVM.m % Display contents of predictLabelsSVM.m file function label = predictLabelsSVM(x) %#codegen %PREDICTLABELSSVM Label new observations using trained SVM model Mdl % predictLabelsSVM predicts the vector of labels label using % the saved SVM model Mdl and the predictor data x. Mdl = loadLearnerForCoder('SVMModel'); label = predict(Mdl,x); end

Add the %#codegen compiler directive (or pragma) to the entry-point function after the function signature to indicate that you intend to generate code for the MATLAB algorithm. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that would result in errors during code generation. See “Check Code with the Code Analyzer” (MATLAB Coder). Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB® opens the example folder. This folder includes the entry-point function file. Generate Code Set Up Compiler To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Generate Code Using codegen Generate code for the entry-point function using codegen. Because C and C++ are statically typed languages, you must determine the properties of all variables in the entry-point function at compile time. Specify the data types and sizes of all inputs of the entry-point function when you call codegen by using the -args option. In this case, pass X as a value of the -args option to specify that the generated code must accept an input that has the same data type and array size as the training data X. codegen predictLabelsSVM -args {X}

If the number of observations is unknown at compile time, you can also specify the input as variablesize by using coder.typeof. For details, see “Specify Variable-Size Arguments for Code Generation” on page 31-38 and “Specify Properties of Entry-Point Function Inputs” (MATLAB Coder) Build Type MATLAB Coder can generate code for the following build types: • MEX (MATLAB Executable) function 31-10

Code Generation for Prediction of Machine Learning Model at Command Line

• Standalone C/C++ code • Standalone C/C++ code compiled to a static library • Standalone C/C++ code compiled to a dynamically linked library • Standalone C/C++ code compiled to an executable You can specify the build type using the -config option of codegen. For more details on setting code generation options, see the -config option of codegen and “Configure Build Settings” (MATLAB Coder). By default, codegen generates a MEX function. A MEX function is a C/C++ program that is executable from MATLAB. You can use a MEX function to accelerate MATLAB algorithms and to test the generated code for functionality and run-time issues. For details, see “MATLAB Algorithm Acceleration” (MATLAB Coder) and “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Code Generation Report You can use the -report flag to produce a code generation report. This report helps you debug code generation issues and view the generated C/C++ code. For details, see “Code Generation Reports” (MATLAB Coder). Verify Generated Code Test a MEX function to verify that the generated code provides the same functionality as the original MATLAB code. To perform this test, run the MEX function using the same inputs that you used to run the original MATLAB code, and then compare the results. Running the MEX function in MATLAB before generating standalone code also enables you to detect and fix run-time errors that are much harder to diagnose in the generated standalone code. For more details, see “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Pass some predictor data to verify whether predict, predictLabelsSVM, and the MEX function return the same labels. labels1 = predict(Mdl,X); labels2 = predictLabelsSVM(X); labels3 = predictLabelsSVM_mex(X);

Compare the predicted labels by using isequal. verifyMEX = isequal(labels1,labels2,labels3) verifyMEX = logical 1

isequal returns logical 1 (true), which means all the inputs are equal. The comparison confirms that the predict function, predictLabelsSVM function, and MEX function return the same labels.

See Also codegen | learnerCoderConfigurer | loadLearnerForCoder | saveLearnerForCoder

Related Examples •

“Introduction to Code Generation” on page 31-2 31-11

31

Code Generation

31-12



“Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 31-16



“Code Generation for Prediction and Update Using Coder Configurer” on page 31-68



“Code Generation and Classification Learner App” on page 31-25



“Specify Variable-Size Arguments for Code Generation” on page 31-38



Function List (C/C++ Code Generation)

Code Generation for Nearest Neighbor Searcher

Code Generation for Nearest Neighbor Searcher The object functions knnsearch and rangesearch of the nearest neighbor searcher objects, ExhaustiveSearcher and KDTreeSearcher, support code generation. This example shows how to generate code for finding the nearest neighbor using an exhaustive searcher object at the command line. The example shows two different ways to generate code, depending on the way you use the object: load the object by using loadLearnerForCoder in an entry-point function, and pass a compile-time constant object to the generated code. Train Exhaustive Nearest Neighbor Searcher Load Fisher's iris data set. load fisheriris

Remove five irises randomly from the predictor data to use as a query set. rng('default'); % For reproducibility n = size(meas,1); % Sample size qIdx = randsample(n,5); % Indices of query data X = meas(~ismember(1:n,qIdx),:); Y = meas(qIdx,:);

Prepare an exhaustive nearest neighbor searcher using the training data. Specify the 'Distance' and 'P' name-value pair arguments to use the Minkowski distance with an exponent of 1 for finding the nearest neighbor. Mdl = ExhaustiveSearcher(X,'Distance','minkowski','P',1);

Find the index of the training data (X) that is the nearest neighbor of each point in the query data (Y). Idx = knnsearch(Mdl,Y);

Generate Code Using saveLearnerForCoder and loadLearnerForCoder Generate code that loads an exhaustive searcher, takes query data as an input argument, and then finds the nearest neighbor. Save the exhaustive searcher to a file using saveLearnerForCoder. saveLearnerForCoder(Mdl,'searcherModel')

saveLearnerForCoder saves the model to the MATLAB binary file searcherModel.mat as a structure array in the current folder. Define the entry-point function myknnsearch1 that takes query data as an input argument. Within the function, load the searcher object by using loadLearnerForCoder, and then pass the loaded model to knnsearch. type myknnsearch1.m % Display contents of myknnsearch1.m file function idx = myknnsearch1(Y) %#codegen Mdl = loadLearnerForCoder('searcherModel'); idx = knnsearch(Mdl,Y); end

31-13

31

Code Generation

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB® opens the example folder. This folder includes the entry-point function files, myknnsearch1.m, myknnsearch2.m, and myknnsearch3.m. Generate code for myknnsearch1 by using codegen. Specify the data type and dimension of the input argument by using coder.typeof so that the generated code accepts a variable-size array. codegen myknnsearch1 -args {coder.typeof(Y,[Inf,4],[1,0])}

For a more detailed code generation example that uses saveLearnerForCoder and loadLearnerForCoder, see “Code Generation for Prediction of Machine Learning Model at Command Line” on page 31-9. For more details about specifying variable-size arguments, see “Specify Variable-Size Arguments for Code Generation” on page 31-38. Pass the query data (Y) to verify that myknnsearch1 and the MEX file return the same indices. myIdx1 = myknnsearch1(Y); myIdx1_mex = myknnsearch1_mex(Y);

Compare myIdx1 and myIdx1_mex by using isequal. verifyMEX1 = isequal(Idx,myIdx1,myIdx1_mex) verifyMEX1 = logical 1

isequal returns logical 1 (true) if all the inputs are equal. This comparison confirms that myknnsearch1 and the MEX file return the same results. Generate Code with Constant Folded Model Object Nearest neighbor searcher objects can be an input argument of a function you define for code generation. The -args option of codegen accept a compile-time constant searcher object. Define the entry-point function myknnsearch2 that takes both an exhaustive searcher model and query data as input arguments instead of loading the model in the function. type myknnsearch2.m % Display contents of myknnsearch2.m file function idx = myknnsearch2(Mdl,Y) %#codegen idx = knnsearch(Mdl,Y); end

To generate code that takes the model object as well as the query data, designate the model object as a compile-time constant by using coder.Constant and include the constant folded model object in the -args value of codegen. codegen myknnsearch2 -args {coder.Constant(Mdl),coder.typeof(Y,[Inf,4],[1,0])}

The code generation workflow with a constant folded model object follows general code generation workflow. For details, see “General Code Generation Workflow” on page 31-5. Verify that myknnsearch2 and the MEX file return the same results. myIdx2 = myknnsearch2(Mdl,Y); myIdx2_mex = myknnsearch2_mex(Mdl,Y); verifyMEX2 = isequal(Idx,myIdx2,myIdx2_mex)

31-14

Code Generation for Nearest Neighbor Searcher

verifyMEX2 = logical 1

Generate Code with Name-Value Pair Arguments Define the entry-point function myknnsearch3 that takes a model object, query data, and name-value pair arguments. You can allow for optional name-value arguments by specifying varargin as an input argument. For details, see “Code Generation for Variable Length Argument Lists” (MATLAB Coder). type myknnsearch3.m % Display contents of myknnsearch3.m file function idx = myknnsearch3(Mdl,Y,varargin) %#codegen idx = knnsearch(Mdl,Y,varargin{:}); end

To generate code that allows a user-defined exponent for the Minkowski distance, include {coder.Constant('P'),0} in the -args value of codegen. Use coder.Constant because the name of a name-value pair argument must be a compile-time constant.

codegen myknnsearch3 -args {coder.Constant(Mdl),coder.typeof(Y,[Inf,4],[1,0]),coder.Constant('P')

Verify that myknnsearch3 and the MEX file return the same results. newIdx = knnsearch(Mdl,Y,'P',2); myIdx3 = myknnsearch3(Mdl,Y,'P',2); myIdx3_mex = myknnsearch3_mex(Mdl,Y,'P',2); verifyMEX3 = isequal(newIdx,myIdx3,myIdx3_mex) verifyMEX3 = logical 1

See Also ExhaustiveSearcher | KDTreeSearcher | codegen | knnsearch | loadLearnerForCoder | rangesearch | saveLearnerForCoder

Related Examples •

“Introduction to Code Generation” on page 31-2



“General Code Generation Workflow” on page 31-5



“Code Generation for Prediction of Machine Learning Model at Command Line” on page 31-9



“Specify Variable-Size Arguments for Code Generation” on page 31-38

31-15

31

Code Generation

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App This example shows how to generate C/C++ code for the prediction of classification and regression model objects by using the MATLAB® Coder™ app. You can also generate code at the command line using codegen. See “Code Generation for Prediction of Machine Learning Model at Command Line” on page 31-9 for details. Certain classification and regression model objects have a predict or random function that supports code generation. Prediction using these object functions requires a trained classification or regression model object, but an entry-point function for code generation cannot have these objects as input variables. Work around this limitation by using saveLearnerForCoder and loadLearnerForCoder as described in this example. This flow chart shows the code generation workflow for the object functions of classification and regression model objects.

In this example, you train a classification ensemble model using k-nearest-neighbor weak learners and save the trained model by using saveLearnerForCoder. Then, define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the object function. Write a script to test the entry-point function. Finally, generate code by using the MATLAB Coder app and verify the generated code. Train Classification Model Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere

Train a classification ensemble model with k-nearest-neighbor weak learners by using the random subspace method. For details of classifications that use a random subspace ensemble, see “Random Subspace Classification” on page 18-103. rng('default') % For reproducibility learner = templateKNN('NumNeighbors',2); Mdl = fitcensemble(X,Y,'Method','Subspace','NPredToSample',5, ... 'Learners',learner,'NumLearningCycles',13);

Save Model Using saveLearnerForCoder Save the trained ensemble model to a file named knnEnsemble.mat in your current folder. saveLearnerForCoder(Mdl,'knnEnsemble')

saveLearnerForCoder makes the full classification model Mdl compact, and then saves it to the MATLAB binary file knnEnsemble.mat as a structure array in the current folder. Define Entry-Point Function An entry-point function, also known as the top-level or primary function, is a function you define for code generation. You must define an entry-point function that calls code-generation-enabled functions 31-16

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App

and generate C/C++ code from the entry-point function. All functions within the entry-point function must support code generation. In a new file in your current folder, define an entry-point function named myknnEnsemblePredict that does the following: • Accept input data (X), the file name of the saved model (fileName), and valid name-value pair arguments of the predict function (varargin). • Load a trained ensemble model by using loadLearnerForCoder. • Predict labels and corresponding scores from the loaded model. You can allow for optional name-value arguments by specifying varargin as an input argument. For details, see “Code Generation for Variable Length Argument Lists” (MATLAB Coder). type myknnEnsemblePredict.m % Display the contents of myknnEnsemblePredict.m file. function [label,score] = myknnEnsemblePredict(X,fileName,varargin) %#codegen CompactMdl = loadLearnerForCoder(fileName); [label,score] = predict(CompactMdl,X,varargin{:}); end

Add the %#codegen compiler directive (or pragma) to the entry-point function after the function signature to indicate that you intend to generate code for the MATLAB algorithm. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that would result in errors during code generation. See “Check Code with the Code Analyzer” (MATLAB Coder). Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB® opens the example folder. This folder includes the entry-point function file (myknnEnsemblePredict.m) and the test file (test_myknnEnsemblePredict.m, described later on). Set Up Compiler To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Create Test File Write a test script that calls the myknnEnsemblePredict function. In the test script, specify the input arguments and name-value pair arguments that you use in the generated code. You use this test script to define input types automatically when generating code using the MATLAB Coder app. In this example, create the test_myknnEnsemblePredict.m file in your current folder, as shown. type test_myknnEnsemblePredict.m % Display the contents of test_myknnEnsemblePredict.m file. %% Load Sample data load ionosphere %% Test myknnEnsemblePredict [label,score] = myknnEnsemblePredict(X,'knnEnsemble','Learners',1:13);

For details, see “Automatically Define Input Types by Using the App” (MATLAB Coder). 31-17

31

Code Generation

Generate Code Using MATLAB Coder App The MATLAB Coder app generates C or C++ code from MATLAB® code. The workflow-based user interface steps you through the code generation process. The following steps describe a brief workflow of the MATLAB Code App. For more details, see MATLAB Coder and “Generate C Code by Using the MATLAB Coder App” (MATLAB Coder). 1. Open the MATLAB Code App and Select the Entry-Point Function File. On the Apps tab, in the Apps section, click the Show more arrow to open the apps gallery. Under Code Generation, click MATLAB Coder. The app opens the Select Source Files page. Enter or select the name of the entry-point function, myknnEnsemblePredict.

Click Next to go to the Define Input Types page. 2. Define Input Types Because C uses static typing, MATLAB Coder must determine the properties of all variables in the MATLAB files at compile time. Therefore, you need to specify the properties of the entry-point function inputs. Enter or select the test script test_myknnEnsemblePredict and click Autodefine Input Types. 31-18

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App

The MATLAB Coder app recognizes input types of the myknnEnsemblePredict function based on the test script. Modify the input types: • X — The app infers that input X is double(351x34). The number of predictors must be fixed to be the same as the number of predictors in the trained model. However, you can have a different number of observations for prediction. If the number of observations is unknown, change double(351x34) to double(:351x34) or double(:infx34). The setting double(:351x34) allows the number of observations up to 351, and the setting double(:infx34) allows an unbounded number of observations. In this example, specify double(:infx34) by clicking 351 and selecting :inf. • fileName — Click char, select Define Constant, and type the file name with single quotes, 'knnEnsemble'. • varargin{1} — Names in name-value pair arguments must be compile-time constants. Click char, select Define Constant, and type 'Learners'. • varargin{2} — To allow user-defined indices up to 13 weak learners in the generated code, change double(1x13) to double(1x:13).

31-19

31

Code Generation

Click Next to go to the Check for Run-Time Issues page. This optional step generates a MEX file, runs the MEX function, and reports issues. Click Next to go to the Generate Code page. 3. Generate C Code Set Build type to MEX and click Generate. The app generates a MEX function, myknnEnsemblePredict_mex. A MEX function is a C/C++ program that is executable from MATLAB. You can use a MEX function to accelerate MATLAB algorithms and to test the generated code for functionality and run-time issues. For details, see “MATLAB Algorithm Acceleration” (MATLAB Coder) and “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Depending on the specified build type, MATLAB Coder generates a MEX function or standalone C/C+ + code compiled to a static library, dynamic linked library, or executable. For details on setting a build type, see “Configure Build Settings” (MATLAB Coder).

31-20

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App

Click Next to go to the Finish Workflow page. 4. Review the Finish Workflow Page The Finish Workflow page indicates that code generation succeeded. This page also provides a project summary and links to generated output.

31-21

31

Code Generation

Generate Code Using Script You can convert a MATLAB Coder project to the equivalent script of MATLAB commands after you define input types. Then you run the script to generate code. For details, see “Convert MATLAB Coder Project to MATLAB Script” (MATLAB Coder). On the MATLAB Coder app toolbar, click the Open action menu button: Select Convert to script, and then click Save. The app creates the file myknnEnsemblePredict_script.m, which reproduces the project in a configuration object and runs the codegen function. Display the contents of the file myknnEnsemblePredict_script.m. type myknnEnsemblePredict_script.m % MYKNNENSEMBLEPREDICT_SCRIPT Generate MEX-function myknnEnsemblePredict_mex % from myknnEnsemblePredict. % % Script generated from project 'myknnEnsemblePredict.prj' on 17-Nov-2017. %

31-22

Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App

% See also CODER, CODER.CONFIG, CODER.TYPEOF, CODEGEN. %% Create configuration object of class 'coder.MexCodeConfig'. cfg = coder.config('mex'); cfg.GenerateReport = true; cfg.ReportPotentialDifferences = false; %% Define argument types for entry-point 'myknnEnsemblePredict'. ARGS = cell(1,1); ARGS{1} = cell(4,1); ARGS{1}{1} = coder.typeof(0,[Inf 34],[1 0]); ARGS{1}{2} = coder.Constant('knnEnsemble'); ARGS{1}{3} = coder.Constant('Learners'); ARGS{1}{4} = coder.typeof(0,[1 13],[0 1]); %% Invoke MATLAB Coder. codegen -config cfg myknnEnsemblePredict -args ARGS{1} -nargout 2

Run the script. myknnEnsemblePredict_script

Code generation successful: To view the report, open('codegen\mex\myknnEnsemblePredict\html\repor

Verify Generated Code Test a MEX function to verify that the generated code provides the same functionality as the original MATLAB code. To perform this test, run the MEX function using the same inputs that you used to run the original MATLAB code, and then compare the results. Running the MEX function in MATLAB before generating standalone code also enables you to detect and fix run-time errors that are much harder to diagnose in the generated standalone code. For more details, see “Why Test MEX Functions in MATLAB?” (MATLAB Coder). Pass some predictor data to verify that myknnEnsemblePredict and the MEX function return the same results. [label1,score1] = predict(Mdl,X,'Learners',1:10); [label2,score2] = myknnEnsemblePredict(X,'knnEnsemble','Learners',1:10); [label3,score3] = myknnEnsemblePredict_mex(X,'knnEnsemble','Learners',1:10);

Compare label1, label2, and label3 by using isequal. isequal(label1,label2,label3) ans = logical 1

isequal returns logical 1 (true), which means all the inputs are equal. The score3 output from the MEX function might include round-off differences compared with the output from the predict function. In this case, compare score1 and score3, allowing a small tolerance. find(abs(score1-score3) > 1e-12) ans = 0x1 empty double column vector

31-23

31

Code Generation

find returns an empty vector if the element-wise absolute difference between score1 and score3 is not larger than the specified tolerance 1e-12. The comparisons confirm that myknnEnsemblePredict and the MEX function return the same results.

See Also codegen | learnerCoderConfigurer | loadLearnerForCoder | saveLearnerForCoder

More About

31-24



“Introduction to Code Generation” on page 31-2



“Code Generation for Prediction of Machine Learning Model at Command Line” on page 31-9



“Code Generation for Prediction and Update Using Coder Configurer” on page 31-68



“Code Generation and Classification Learner App” on page 31-25



“Specify Variable-Size Arguments for Code Generation” on page 31-38



“Generate C Code by Using the MATLAB Coder App” (MATLAB Coder)



Function List (C/C++ Code Generation)

Code Generation and Classification Learner App

Code Generation and Classification Learner App Classification Learner is well suited for choosing and training classification models interactively, but it does not generate C/C++ code that labels data based on a trained model. The Generate Function button in the Export section of the Classification Learner app generates MATLAB code for training a model but does not generate C/C++ code. This example shows how to generate C code from a function that predicts labels using an exported classification model. The example builds a model that predicts the credit rating of a business given various financial ratios, according to these steps: 1

Use the credit rating data set in the file CreditRating_Historical.dat, which is included with Statistics and Machine Learning Toolbox.

2

Reduce the data dimensionality using principal component analysis (PCA).

3

Train a set of models that support code generation for label prediction.

4

Export the model with the minimum 5-fold, cross-validated classification accuracy.

5

Generate C code from an entry-point function that transforms the new predictor data and then predicts corresponding labels using the exported model.

Load Sample Data Load sample data and import the data into the Classification Learner app. Review the data using scatter plots and remove unnecessary predictors. Use readtable to load the historical credit rating data set in the file CreditRating_Historical.dat into a table. creditrating = readtable('CreditRating_Historical.dat');

On the Apps tab, click Classification Learner. In Classification Learner, on the Classification Learner tab, in the File section, click New Session and select From Workspace. In the New Session dialog box, select the table creditrating. All variables, except the one identified as the response, are double-precision numeric vectors. Click Start Session to compare classification models based on the 5-fold, cross-validated classification accuracy. Classification Learner loads the data and plots a scatter plot of the variables WC_TA versus ID. Because identification numbers are not helpful to display in a plot, choose RE_TA for X under Predictors.

31-25

31

Code Generation

The scatter plot suggests that the two variables can separate the classes AAA, BBB, BB, and CCC fairly well. However, the observations corresponding to the remaining classes are mixed into these classes. Identification numbers are not helpful for prediction. Therefore, in the Features section, click Feature Selection and then clear the ID check box. You can also remove unnecessary predictors from the beginning by using the check boxes in the New Session dialog box. This example shows how to remove unused predictors for code generation when you have included all predictors.

Enable PCA Enable PCA to reduce the data dimensionality. In the Features section, click PCA, and then select Enable PCA. This action applies PCA to the predictor data, and then transforms the data before training the models. Classification Learner uses only components that collectively explain 95% of the variability.

Train Models Train a set of models that support code generation for label prediction. 31-26

Code Generation and Classification Learner App

Select the following classification models and options, which support code generation for label prediction, and then perform cross-validation (for more details, see “Introduction to Code Generation” on page 31-2). To select each model, in the Model Type section, click the Show more arrow, and then click the model. After selecting a model and specifying any options, close any open menus, and then click Train in the Training section. Models and Options to Select

Description

Under Decision Trees, select All Trees

Classification trees of various complexities

Under Support Vector Machines, select All SVMs

SVMs of various complexities and using various kernels. Complex SVMs require time to fit.

Under Ensemble Classifiers, select Boosted Boosted ensemble of classification trees Trees. In the Model Type section, click Advanced. Reduce Maximum number of splits to 5 and increase Number of learners to 100. Under Ensemble Classifiers, select Bagged Trees. In the Model Type section, click Advanced. Increase Maximum number of splits to 50 and increase Number of learners to 100.

Random forest of classification trees

After cross-validating each model type, the Data Browser displays each model and its 5-fold, crossvalidated classification accuracy, and highlights the model with the best accuracy.

31-27

31

Code Generation

Select the model that yields the maximum 5-fold, cross-validated classification accuracy, which is the error-correcting output codes (ECOC) model of Fine Gaussian SVM learners. With PCA enabled, Classification Learner uses two predictors out of six. In the Plots section, click Confusion Matrix.

31-28

Code Generation and Classification Learner App

The model does well distinguishing between A, B, and C classes. However, the model does not do as well distinguishing between particular levels within those groups, the lower B levels in particular.

Export Model to Workspace Export the model to the MATLAB Workspace and save the model using saveLearnerForCoder. In the Export section, click Export Model, and then select Export Compact Model. Click OK in the dialog box. The structure trainedModel appears in the MATLAB Workspace. The field ClassificationSVM of trainedModel contains the compact model.

31-29

31

Code Generation

At the command line, save the compact model to a file called ClassificationLearnerModel.mat in your current folder. saveLearnerForCoder(trainedModel.ClassificationSVM,'ClassificationLearnerModel')

Generate C Code for Prediction Prediction using the object functions requires a trained model object, but the -args option of codegen does not accept such objects. Work around this limitation by using saveLearnerForCoder and loadLearnerForCoder. Save a trained model by using saveLearnerForCoder. Then, define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the predict function. Finally, use codegen to generate code for the entry-point function. Preprocess Data Preprocess new data in the same way you preprocess the training data. To preprocess, you need the following three model parameters: • removeVars — Column vector of at most p elements identifying indices of variables to remove from the data, where p is the number of predictor variables in the raw data • pcaCenters — Row vector of exactly q PCA centers • pcaCoefficients — q-by-r matrix of PCA coefficients, where r is at most q Specify the indices of predictor variables that you removed while selecting data using Feature Selection in Classification Learner. Extract the PCA statistics from trainedModel. removeVars = 1; pcaCenters = trainedModel.PCACenters; pcaCoefficients = trainedModel.PCACoefficients;

Save the model parameters to a file named ModelParameters.mat in your current folder. save('ModelParameters.mat','removeVars','pcaCenters','pcaCoefficients');

Define Entry-Point Function An entry-point function is a function you define for code generation. Because you cannot call any function at the top level using codegen, you must define an entry-point function that calls codegeneration-enabled functions, and then generate C/C++ code for the entry-point function by using codegen. In your current folder, define a function named mypredictCL.m that: • Accepts a numeric matrix (X) of raw observations containing the same predictor variables as the ones passed into Classification Learner • Loads the classification model in ClassificationLearnerModel.mat and the model parameters in ModelParameters.mat • Removes the predictor variables corresponding to the indices in removeVars • Transforms the remaining predictor data using the PCA centers (pcaCenters) and coefficients (pcaCoefficients) estimated by Classification Learner • Returns predicted labels using the model 31-30

Code Generation and Classification Learner App

function label = mypredictCL(X) %#codegen %MYPREDICTCL Classify credit rating using model exported from %Classification Learner % MYPREDICTCL loads trained classification model (SVM) and model % parameters (removeVars, pcaCenters, and pcaCoefficients), removes the % columns of the raw matrix of predictor data in X corresponding to the % indices in removeVars, transforms the resulting matrix using the PCA % centers in pcaCenters and PCA coefficients in pcaCoefficients, and then % uses the transformed data to classify credit ratings. X is a numeric % matrix with n rows and 7 columns. label is an n-by-1 cell array of % predicted labels. % Load trained classification model and model parameters SVM = loadLearnerForCoder('ClassificationLearnerModel'); data = coder.load('ModelParameters'); removeVars = data.removeVars; pcaCenters = data.pcaCenters; pcaCoefficients = data.pcaCoefficients; % Remove unused predictor variables keepvars = 1:size(X,2); idx = ~ismember(keepvars,removeVars); keepvars = keepvars(idx); XwoID = X(:,keepvars); % Transform predictors via PCA Xpca = bsxfun(@minus,XwoID,pcaCenters)*pcaCoefficients; % Generate label from SVM label = predict(SVM,Xpca); end

Generate Code Because C and C++ are statically typed languages, you must determine the properties of all variables in the entry-point function at compile time. Specify variable-size arguments using coder.typeof and generate code using the arguments. Create a double-precision matrix called x for code generation using coder.typeof. Specify that the number of rows of x is arbitrary, but that x must have p columns. p = size(creditrating,2) - 1; x = coder.typeof(0,[Inf,p],[1 0]);

For more details about specifying variable-size arguments, see “Specify Variable-Size Arguments for Code Generation” on page 31-38. Generate a MEX function from mypredictCL.m. Use the -args option to specify x as an argument. codegen mypredictCL -args x

codegen generates the MEX file mypredictCL_mex.mexw64 in your current folder. The file extension depends on your platform. Verify Generated Code Verify that the MEX function returns the expected labels. 31-31

31

Code Generation

Remove the response variable from the original data set, and then randomly draw 15 observations. rng('default'); % For reproducibility m = 15; testsampleT = datasample(creditrating(:,1:(end - 1)),m);

Predict corresponding labels by using predictFcn in the classification model trained by Classification Learner. testLabels = trainedModel.predictFcn(testsampleT);

Convert the resulting table to a matrix. testsample = table2array(testsampleT);

The columns of testsample correspond to the columns of the predictor data loaded by Classification Learner. Pass the test data to mypredictCL. The function mypredictCL predicts corresponding labels by using predict and the classification model trained by Classification Learner. testLabelsPredict = mypredictCL(testsample);

Predict corresponding labels by using the generated MEX function mypredictCL_mex. testLabelsMEX = mypredictCL_mex(testsample);

Compare the sets of predictions. isequal(testLabels,testLabelsMEX,testLabelsPredict) ans = logical 1

isequal returns logical 1 (true) if all the inputs are equal. predictFcn, mypredictCL, and the MEX function return the same values.

See Also codegen | coder.typeof | learnerCoderConfigurer | loadLearnerForCoder | saveLearnerForCoder

Related Examples

31-32



“Classification Learner App”



“Introduction to Code Generation” on page 31-2



“Code Generation for Prediction of Machine Learning Model at Command Line” on page 31-9



“Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 31-16



“Code Generation for Prediction and Update Using Coder Configurer” on page 31-68



“Specify Variable-Size Arguments for Code Generation” on page 31-38

Code Generation and Classification Learner App



“Apply PCA to New Data and Generate C/C++ Code” on page 32-3884

31-33

31

Code Generation

Predict Class Labels Using MATLAB Function Block This example shows how to add a MATLAB Function block to a Simulink® for label prediction. The MATLAB Function block accepts streaming data, and predicts the label and classification score using a trained, support vector machine (SVM) classification model. For details on using the MATLAB Function block, see “Create Custom Functionality Using MATLAB Function Block” (Simulink). The ionosphere data set, which is included in the Statistics and Machine Learning Toolbox™, contains radar-return qualities (Y) and predictor data (X). Radar returns are either of good quality ('g') or of bad quality ('b'). Load the ionosphere data set. Determine the sample size. load ionosphere n = numel(Y) n = 351

The MATLAB Function block cannot return cell arrays. So, convert the response variable to a logical vector whose elements are 1 if the radar returns are good, and 0 otherwise. Y = strcmp(Y,'g');

Suppose that the radar returns are detected in sequence, and you have the first 300 observations, but you have not received the last 51 yet. Partition the data into present and future samples. prsntX prsntY ftrX = ftrY =

= X(1:300,:); = Y(1:300); X(301:end,:); Y(301:end);

Train an SVM model using all, presently available data. Specify predictor data standardization. Mdl = fitcsvm(prsntX,prsntY,'Standardize',true);

Mdl is a ClassificationSVM model. At the command line, you can use Mdl to make predictions for new observations. However, you cannot use Mdl as an input argument in a function meant for code generation. Prepare Mdl to be loaded within the function using saveLearnerForCoder. saveLearnerForCoder(Mdl,'SVMIonosphere');

saveLearnerForCoder compacts Mdl, and then saves it in the MAT-file SVMIonosphere.mat. Define an entry-point function named svmIonospherePredict.m that predicts whether a radar return is of good quality. The function should: • Include the code generation directive %#codegen somewhere in the function. • Accept radar-return predictor data. The data must be commensurate with X except for the number of rows. 31-34

Predict Class Labels Using MATLAB Function Block

• Load SVMIonosphere.mat using loadLearnerForCoder. • Return predicted labels and classification scores for predicting the quality of the radar return as good (that is, the positive-class score). function [label,score] = svmIonospherePredict(X) %#codegen %svmIonospherePredict Predict radar-return quality using SVM model % svmIonospherePredict predicts labels and estimates classification % scores of the radar returns in the numeric matrix of predictor data X % using the compact SVM model in the file SVMIonosphere.mat. Rows of X % correspond to observations and columns to predictor variables. label % is the predicted label and score is the confidence measure for % classifying the radar-return quality as good. % % Copyright 2016 The MathWorks Inc. Mdl = loadLearnerForCoder('SVMIonosphere'); [label,bothscores] = predict(Mdl,X); score = bothscores(:,2); end

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB opens the example folder. This folder includes the entry-point function file. Load the Simulink® model slexSVMIonospherePredictExample.slx. SimMdlName = 'slexSVMIonospherePredictExample'; open_system(SimMdlName);

The figure displays the Simulink® model. When the input node detects a radar return, it directs that observation into the MATLAB Function block that dispatches to svmIonospherePredict.m. After predicting the label and score, the model returns these values to the workspace and displays the values within the model one at a time. When you load slexSVMIonospherePredictExample.slx, MATLAB® also loads the data set that it requires called radarReturnInput. However, this example shows how to construct the required data set. The model expects to receive input data as a structure array called radarReturnInput containing these fields:

31-35

31

Code Generation

• time - The points in time at which the observations enter the model. In the example, the duration includes the integers from 0 though 50. The orientation must correspond to the observations in the predictor data. So, for this example, time must be a column vector. • signals - A 1-by-1 structure array describing the input data, and containing the fields values and dimensions. values is a matrix of predictor data. dimensions is the number of predictor variables. Create an appropriate structure array for future radar returns. radarReturnInput.time = (0:50)'; radarReturnInput.signals(1).values = ftrX; radarReturnInput.signals(1).dimensions = size(ftrX,2);

You can change the name from radarReturnInput, and then specify the new name in the model. However, Simulink® expects the structure array to contain the described field names. Simulate the model using the data held out of training, that is, the data in radarReturnInput. sim(SimMdlName);

The figure shows the model after it processes all observations in radarReturnInput one at a time. The predicted label of X(351,:) is 1 and its positive-class score is 1.431. The variables tout, yout, and svmlogsout appear in the workspace. yout and svmlogsout are SimulinkData.Dataset objects containing the predicted labels and scores. For more details, see “Data Format for Logged Simulation Data” (Simulink). Extract the simulation data from the simulation log. labelsSL = svmlogsout.getElement(1).Values.Data; scoresSL = svmlogsout.getElement(2).Values.Data;

labelsSL is a 51-by-1 numeric vector of predicted labels. labelsSL(j) = 1 means that the SVM model predicts that radar return j in the future sample is of good quality, and 0 means otherwise. 31-36

Predict Class Labels Using MATLAB Function Block

scoresSL is a 51-by-1 numeric vector of positive-class scores, that is, signed distances from the decision boundary. Positive scores correspond to predicted labels of 1, and negative scores correspond to predicted labels of 0. Predict labels and positive-class scores at the command line using predict. [labelCMD,scoresCMD] = predict(Mdl,ftrX); scoresCMD = scoresCMD(:,2);

labelCMD and scoresCMD are commensurate with labelsSL and scoresSL. Compare the future-sample, positive-class scores returned by slexSVMIonospherePredictExample to those returned by calling predict at the command line. err = sum((scoresCMD - scoresSL).^2); err < eps ans = logical 1

The sum of squared deviations between the sets of scores is negligible. If you also have a Simulink® Coder™ license, then you can generate C code from slexSVMIonospherePredictExample.slx in Simulink® or from the command line using rtwbuild. For more details, see “Generate C Code for a Model” (Simulink Coder).

See Also learnerCoderConfigurer | loadLearnerForCoder | predict | rtwbuild | saveLearnerForCoder

Related Examples •

“Introduction to Code Generation” on page 31-2



“Code Generation for Image Classification”



“System Objects for Classification and Code Generation” on page 31-47



“Predict Class Labels Using Stateflow” on page 31-55



“Human Activity Recognition Simulink Model for Smartphone Deployment” on page 31-59



“Data Format for Logged Simulation Data” (Simulink)



“Generate C Code for a Model” (Simulink Coder)

31-37

31

Code Generation

Specify Variable-Size Arguments for Code Generation This example shows how to specify variable-size input arguments when you generate code for the object functions of classification and regression model objects. Variable-size data is data whose size might change at run time. Specifying variable-size input arguments is convenient when you have data with an unknown size at compile time. This example also describes how to include name-value pair arguments in an entry-point function and how to specify them when generating code. For more detailed code generation workflow examples, see “Code Generation for Prediction of Machine Learning Model at Command Line” on page 31-9 and “Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 31-16. Train Classification Model Load Fisher's iris data set. Convert the labels to a character matrix. load fisheriris species = char(species);

Train a classification tree using the entire data set. Mdl = fitctree(meas,species);

Mdl is a ClassificationTree model. Save Model Using saveLearnerForCoder Save the trained classification tree to a file named ClassTreeIris.mat in your current folder by using saveLearnerForCoder. MdlName = 'ClassTreeIris'; saveLearnerForCoder(Mdl,MdlName);

Define Entry-Point Function In your current folder, define an entry-point function named mypredictTree.m that does the following: • Accept measurements with columns corresponding to meas and accept valid name-value pair arguments. • Load a trained classification tree by using loadLearnerForCoder. • Predict labels and corresponding scores, node numbers, and class numbers from the loaded classification tree. You can allow for optional name-value pair arguments by specifying varargin as an input argument. For details, see “Code Generation for Variable Length Argument Lists” (MATLAB Coder). type mypredictTree.m

% Display contents of mypredictTree.m file

function [label,score,node,cnum] = mypredictTree(x,savedmdl,varargin) %#codegen %MYPREDICTTREE Predict iris species using classification tree % MYPREDICTTREE predicts iris species for the n observations in the % n-by-4 matrix x using the classification tree stored in the MAT-file % whose name is in savedmdl, and then returns the predictions in the % array label. Each row of x contains the lengths and widths of the petal % and sepal of an iris (see the fisheriris data set). For other output

31-38

Specify Variable-Size Arguments for Code Generation

% argument descriptions, see the predict reference page. CompactMdl = loadLearnerForCoder(savedmdl); [label,score,node,cnum] = predict(CompactMdl,x,varargin{:}); end

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB® opens the example folder. This folder includes the entry-point function file. Generate Code Specify Variable-Size Arguments Because C and C++ are statically typed languages, you must determine the properties of all variables in an entry-point function at compile time using the -args option of codegen. Use coder.Constant to specify a compile-time constant input. coder.Constant(v)

coder.Constant(v) creates a coder.Constant type variable whose values are constant, the same as v, during code generation. Use coder.typeof to specify a variable-size input. coder.typeof(example_value, size_vector, variable_dims)

The values of example_value, size_vector, and variable_dims specify the properties of the input array that the generated code can accept. • An input array has the same data type as the example values in example_value. • size_vector is the array size of an input array if the corresponding variable_dims value is false. • size_vector is the upper bound of the array size if the corresponding variable_dims value is true. • variable_dims specifies whether each dimension of the array has a variable size or a fixed size. A value of true (logical 1) means that the corresponding dimension has a variable size; a value of false (logical 0) means that the corresponding dimension has a fixed size. The entry-point function mypredictTree accepts predictor data, the MAT-file name containing the trained model object, and optional name-value pair arguments. Suppose that you want to generate code that accepts a variable-size array for predictor data and the 'Subtrees' name-value pair argument with a variable-size vector for its value. Then you have four input arguments: predictor data, the MAT-file name, and the name and value of the 'Subtrees' name-value pair argument. Define a 4-by-1 cell array and assign each input argument type of the entry-point function to each cell. ARGS = cell(4,1);

For the first input, use coder.typeof to specify that the predictor data variable is double-precision with the same number of columns as the predictor data used in training the model, but that the number of observations (rows) is arbitrary. p = numel(Mdl.PredictorNames); ARGS{1} = coder.typeof(0,[Inf,p],[1,0]);

31-39

31

Code Generation

0 for the example_value value implies that the data type is double because double is the default numeric data type of MATLAB. [Inf,p] for the size_vector value and [1,0] for the variable_dims value imply that the size of the first dimension is variable and unbounded, and the size of the second dimension is fixed to be p. The second input is the MAT-file name, which must be a compile-time constant. Use coder.Constant to specify the type of the second input. ARGS{2} = coder.Constant(MdlName);

The last two inputs are the name and value of the 'Subtrees' name-value pair argument. Names of name-value pair arguments must be compile-time constants. ARGS{3} = coder.Constant('Subtrees');

Use coder.typeof to specify that the value of 'Subtrees' is a double-precision row vector and that the upper bound of the row vector size is max(Mdl.PrunedList). m = max(Mdl.PruneList); ARGS{4} = coder.typeof(0,[1,m],[0,1]);

Again, 0 for the example_value value implies that the data type is double because double is the default numeric data type of MATLAB. [1,m] for the size_vector value and [0,1] for the variable_dims value imply that the size of the first dimension is fixed to be 1, and the size of the second dimension is variable and its upper bound is m. Generate Code Using codegen Generate a MEX function from the entry-point function mypredictTree using the cell array ARGS, which includes input argument types for mypredictTree. Specify the input argument types using the -args option. Specify the number of output arguments in the generated entry-point function using the -nargout option. The generate code includes the specified number of output arguments in the order in which they occur in the entry-point function definition. codegen mypredictTree -args ARGS -nargout 2

codegen generates the MEX function mypredictTree_mex with a platform-dependent extension in your current folder. The predict function accepts single-precision values, double-precision values, and 'all' for the 'SubTrees' name-value pair argument. However, you can specify only double-precision values when you use the MEX function for prediction because the data type specified by ARGS{4} is double. Verify Generated Code Predict labels for a random selection of 15 values from the training data using the generated MEX function and the subtree at pruning level 1. Compare the labels from the MEX function with those predicted by predict. rng('default'); % For reproducibility Xnew = datasample(meas,15); [labelMEX,scoreMEX] = mypredictTree_mex(Xnew,MdlName,'Subtrees',1); [labelPREDICT,scorePREDICT] = predict(Mdl,Xnew,'Subtrees',1); labelPREDICT labelPREDICT = 15x10 char array 'virginica '

31-40

Specify Variable-Size Arguments for Code Generation

'virginica ' 'setosa ' 'virginica ' 'versicolor' 'setosa ' 'setosa ' 'versicolor' 'virginica ' 'virginica ' 'setosa ' 'virginica ' 'virginica ' 'versicolor' 'virginica ' labelMEX labelMEX = 15x1 cell {'virginica' } {'virginica' } {'setosa' } {'virginica' } {'versicolor'} {'setosa' } {'setosa' } {'versicolor'} {'virginica' } {'virginica' } {'setosa' } {'virginica' } {'virginica' } {'versicolor'} {'virginica' }

The predicted labels are the same as the MEX function labels except for the data type. When the response data type is char and codegen cannot determine that the value of Subtrees is a scalar, then the output from the generated code is a cell array of character vectors. For the comparison, you can convert labelsPREDICT to a cell array and use isequal. cell_labelPREDICT = cellstr(labelPREDICT); verifyLabel = isequal(labelMEX,cell_labelPREDICT) verifyLabel = logical 1

isequal returns logical 1 (true), which means all the inputs are equal. Compare the second outputs as well. scoreMex might include round-off differences compared with scorePREDICT. In this case, compare scoreMEX and scorePREDICT, allowing a small tolerance. find(abs(scorePREDICT-scoreMEX) > 1e-8) ans = 0x1 empty double column vector

31-41

31

Code Generation

find returns an empty vector if the element-wise absolute difference between scorePREDICT and scoreMEX is not larger than the specified tolerance 1e-8. The comparison confirms that scorePREDICT and scoreMEX are equal within the tolerance 1e–8.

See Also codegen | coder.Constant | coder.typeof | learnerCoderConfigurer | loadLearnerForCoder | saveLearnerForCoder

Related Examples

31-42



“Introduction to Code Generation” on page 31-2



“Code Generation for Prediction of Machine Learning Model at Command Line” on page 31-9



“Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 31-16



“Code Generation for Prediction and Update Using Coder Configurer” on page 31-68



“Code Generation for Nearest Neighbor Searcher” on page 31-13



“Code Generation and Classification Learner App” on page 31-25

Train SVM Classifier with Categorical Predictors and Generate C/C++ Code

Train SVM Classifier with Categorical Predictors and Generate C/C++ Code This example shows how to generate code for classifying data using a support vector machine (SVM) model. Train the model using numeric and categorical predictors. Because code generation does not support categorical predictors, use dummyvar to convert categorical predictors to numeric dummy variables before fitting an SVM classifier. When passing new data to your trained model, you must preprocess the data in a similar manner. Preprocess Data and Train SVM Classifier Load the patients data set. Create a table using the Diastolic and Systolic numeric variables. Each row of the table corresponds to a different patient. load patients tbl = table(Diastolic,Systolic); head(tbl) ans=8×2 table Diastolic _________ 93 77 83 75 80 70 88 82

Systolic ________ 124 109 125 117 122 121 130 115

Convert the Gender variable to a categorical variable. The order of the categories in categoricalGender is important because it determines the order of the columns in the predictor data. Use dummyvar to convert the categorical variable to a matrix of zeros and ones, where a 1 value in the (i,j)th entry indicates that the ith patient belongs to the jth category. categoricalGender = categorical(Gender); orderGender = categories(categoricalGender) orderGender = 2x1 cell {'Female'} {'Male' } dummyGender = dummyvar(categoricalGender);

Note: The resulting dummyGender matrix is rank deficient. Depending on the type of model you train, this rank deficiency can be problematic. For example, when training linear models, remove the first column of the dummy variables. Create a table that contains the dummy variable dummyGender with the corresponding variable headings. Combine this new table with tbl. 31-43

31

Code Generation

tblGender = array2table(dummyGender,'VariableNames',orderGender); tbl = [tbl tblGender]; head(tbl) ans=8×4 table Diastolic _________ 93 77 83 75 80 70 88 82

Systolic ________ 124 109 125 117 122 121 130 115

Female ______ 0 0 1 1 1 1 1 0

Male ____ 1 1 0 0 0 0 0 1

Convert the SelfAssessedHealthStatus variable to a categorical variable. Note the order of the categories in categoricalHealth, and convert the variable to a numeric matrix using dummyvar. categoricalHealth = categorical(SelfAssessedHealthStatus); orderHealth = categories(categoricalHealth) orderHealth = 4x1 cell {'Excellent'} {'Fair' } {'Good' } {'Poor' } dummyHealth = dummyvar(categoricalHealth);

Create a table that contains dummyHealth with the corresponding variable headings. Combine this new table with tbl. tblHealth = array2table(dummyHealth,'VariableNames',orderHealth); tbl = [tbl tblHealth]; head(tbl) ans=8×8 table Diastolic _________ 93 77 83 75 80 70 88 82

Systolic ________ 124 109 125 117 122 121 130 115

Female ______ 0 0 1 1 1 1 1 0

Male ____ 1 1 0 0 0 0 0 1

Excellent _________ 1 0 0 0 0 0 0 0

Fair ____ 0 1 0 1 0 0 0 0

Good ____ 0 0 1 0 1 1 1 1

Poor ____ 0 0 0 0 0 0 0 0

The third row of tbl, for example, corresponds to a patient with these characteristics: diastolic blood pressure of 83, systolic blood pressure of 125, female, and good self-assessed health status. Because all the values in tbl are numeric, you can convert the table to a matrix X. 31-44

Train SVM Classifier with Categorical Predictors and Generate C/C++ Code

X = table2array(tbl);

Train an SVM classifier using X. Specify the Smoker variable as the response. Y = Smoker; Mdl = fitcsvm(X,Y);

Generate C/C++ Code Generate code that loads the SVM classifier, takes new predictor data as an input argument, and then classifies the new data. Save the SVM classifier to a file using saveLearnerForCoder. saveLearnerForCoder(Mdl,'SVMClassifier')

saveLearnerForCoder saves the classifier to the MATLAB® binary file SVMClassifier.mat as a structure array in the current folder. Define the entry-point function mySVMPredict, which takes new predictor data as an input argument. Within the function, load the SVM classifier by using loadLearnerForCoder, and then pass the loaded classifier to predict. type mySVMPredict.m % Display contents of mySVMPredict.m file function label = mySVMPredict(X) %#codegen Mdl = loadLearnerForCoder('SVMClassifier'); label = predict(Mdl,X); end

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB, then MATLAB opens the example folder. This folder includes the entry-point function file mySVMPredict.m. Generate code for mySVMPredict by using codegen. Specify the data type and dimensions of the new predictor data by using coder.typeof so that the generated code accepts a variable-size array. codegen mySVMPredict -args {coder.typeof(X,[Inf 8],[1 0])}

Verify that mySVMPredict and the MEX file return the same results for the training data. label = predict(Mdl,X); mylabel = mySVMPredict(X); mylabel_mex = mySVMPredict_mex(X); verifyMEX = isequal(label,mylabel,mylabel_mex) verifyMEX = logical 1

Predict Labels for New Data To predict labels for new data, you must first preprocess the new data. If you run the generated code in the MATLAB environment, you can follow the preprocessing steps described in this section. If you deploy the generated code outside the MATLAB environment, the preprocessing steps can differ. In either case, you must ensure that the new data has the same columns as the training data X. 31-45

31

Code Generation

In this example, take the third, fourth, and fifth patients in the patients data set. Preprocess the data for these patients so that the resulting numeric matrix matches the form of the training data. Convert the categorical variables to dummy variables. Because the new observations might not include values from all categories, you need to specify the same categories as the ones used during training and maintain the same category order. In MATLAB, pass the ordered cell array of category names associated with the corresponding training data variable (in this example, orderGender for gender values and orderHealth for self-assessed health status values). newcategoricalGender = categorical(Gender(3:5),orderGender); newdummyGender = dummyvar(newcategoricalGender); newcategoricalHealth = categorical(SelfAssessedHealthStatus(3:5),orderHealth); newdummyHealth = dummyvar(newcategoricalHealth);

Combine all the new data into a numeric matrix. newX = [Diastolic(3:5) Systolic(3:5) newdummyGender newdummyHealth] newX = 3×8 83 75 80

125 117 122

1 1 1

0 0 0

0 0 0

0 1 0

1 0 1

0 0 0

Note that newX corresponds exactly to the third, fourth, and fifth rows of the matrix X. Verify that mySVMPredict and the MEX file return the same results for the new data. newlabel = predict(Mdl,newX); newmylabel = mySVMPredict(newX); newmylabel_mex = mySVMPredict_mex(newX); newverifyMEX = isequal(newlabel,newmylabel,newmylabel_mex) newverifyMEX = logical 1

See Also ClassificationSVM | categorical | codegen | coder.Constant | coder.typeof | dummyvar | loadLearnerForCoder | saveLearnerForCoder

Related Examples

31-46



“Introduction to Code Generation” on page 31-2



“Code Generation for Prediction of Machine Learning Model at Command Line” on page 31-9



“Code Generation for Prediction of Machine Learning Model Using MATLAB Coder App” on page 31-16



“Code Generation and Classification Learner App” on page 31-25

System Objects for Classification and Code Generation

System Objects for Classification and Code Generation This example shows how to generate C code from a MATLAB® System object™ that classifies images of digits by using a trained classification model. This example also shows how to use the System object for classification in Simulink®. The benefit of using System objects over MATLAB function is that System objects are more appropriate for processing large amounts of streaming data. For more details, see “What Are System Objects?” (MATLAB). This example is based on “Code Generation for Image Classification”, which is an alternative workflow to “Digit Classification Using HOG Features” (Computer Vision Toolbox). Load Data Load the digitimages. load digitimages.mat

images is a 28-by-28-by-3000 array of uint16 integers. Each page is a raster image of a digit. Each element is a pixel intensity. Corresponding labels are in the 3000-by-1 numeric vector Y. For more details, enter Description at the command line. Store the number of observations and the number of predictor variables. Create a data partition that specifies to hold out 20% of the data. Extract training and test set indices from the data partition. rng(1); % For reproducibility n = size(images,3); p = numel(images(:,:,1)); cvp = cvpartition(n,'Holdout',0.20); idxTrn = training(cvp); idxTest = test(cvp);

Rescale Data Rescale the pixel intensities so that they range in the interval [0,1] within each image. Specifically, suppose pi j is pixel intensity j within image i. For image i, rescale all of its pixel intensities by using this formula: pi j − min(pi j) pi j =

j

max(pi j) − min(pi j) j

.

j

X = double(images); for i = 1:n minX = min(min(X(:,:,i))); maxX = max(max(X(:,:,i))); X(:,:,i) = (X(:,:,i) - minX)/(maxX - minX); end

Reshape Data For code generation, the predictor data for training must be in a table of numeric variables or a numeric matrix. Reshape the data to a matrix such that predictor variables correspond to columns and images correspond to rows. Because reshape takes elements column-wise, transpose its result. 31-47

31

Code Generation

X = reshape(X,[p,n])';

Train and Optimize Classification Models Cross-validate an ECOC model of SVM binary learners and a random forest based on the training observations. Use 5-fold cross-validation. For the ECOC model, specify predictor standardization and optimize classification error over the ECOC coding design and the SVM box constraint. Explore all combinations of these values: • For the ECOC coding design, use one-versus-one and one-versus-all. • For the SVM box constraint, use three logarithmically spaced values from 0.1 to 100 each. For all models, store the 5-fold cross-validated misclassification rates. coding = {'onevsone' 'onevsall'}; boxconstraint = logspace(-1,2,3); cvLossECOC = nan(numel(coding),numel(boxconstraint)); % For preallocation for i = 1:numel(coding) for j = 1:numel(boxconstraint) t = templateSVM('BoxConstraint',boxconstraint(j),'Standardize',true); CVMdl = fitcecoc(X(idxTrn,:),Y(idxTrn),'Learners',t,'KFold',5,... 'Coding',coding{i}); cvLossECOC(i,j) = kfoldLoss(CVMdl); fprintf('cvLossECOC = %f for model using %s coding and box constraint=%f\n',... cvLossECOC(i,j),coding{i},boxconstraint(j)) end end cvLossECOC cvLossECOC cvLossECOC cvLossECOC cvLossECOC cvLossECOC

= = = = = =

0.058333 0.057083 0.050000 0.120417 0.121667 0.127917

for for for for for for

model model model model model model

using using using using using using

onevsone onevsone onevsone onevsall onevsall onevsall

coding coding coding coding coding coding

and and and and and and

box box box box box box

constraint=0.100000 constraint=3.162278 constraint=100.000000 constraint=0.100000 constraint=3.162278 constraint=100.000000

For the random forest, vary the maximum number of splits by using the values in the sequence 2

3

m

m

{3 , 3 , . . . , 3 }. m is such that 3 is no greater than n - 1. To reproduce random predictor selections, specify 'Reproducible',true. n = size(X,1); m = floor(log(n - 1)/log(3)); maxNumSplits = 3.^(2:m); cvLossRF = nan(numel(maxNumSplits)); for i = 1:numel(maxNumSplits) t = templateTree('MaxNumSplits',maxNumSplits(i),'Reproducible',true); CVMdl = fitcensemble(X(idxTrn,:),Y(idxTrn),'Method','bag','Learners',t,... 'KFold',5); cvLossRF(i) = kfoldLoss(CVMdl); fprintf('cvLossRF = %f for model using %d as the maximum number of splits\n',... cvLossRF(i),maxNumSplits(i)) end cvLossRF cvLossRF cvLossRF cvLossRF

31-48

= = = =

0.323750 0.198333 0.075417 0.017083

for for for for

model model model model

using using using using

9 as the maximum number of splits 27 as the maximum number of splits 81 as the maximum number of splits 243 as the maximum number of splits

System Objects for Classification and Code Generation

cvLossRF = 0.012083 for model using 729 as the maximum number of splits cvLossRF = 0.012083 for model using 2187 as the maximum number of splits

For each algorithm, determine the hyperparameter indices that yield the minimal misclassification rates. minCVLossECOC = min(cvLossECOC(:)) minCVLossECOC = 0.0500 linIdx = find(cvLossECOC == minCVLossECOC,1); [bestI,bestJ] = ind2sub(size(cvLossECOC),linIdx); bestCoding = coding{bestI} bestCoding = 'onevsone' bestBoxConstraint = boxconstraint(bestJ) bestBoxConstraint = 100 minCVLossRF = min(cvLossRF(:)) minCVLossRF = 0.0121 linIdx = find(cvLossRF == minCVLossRF,1); [bestI,bestJ] = ind2sub(size(cvLossRF),linIdx); bestMNS = maxNumSplits(bestI) bestMNS = 729

The random forest achieves a smaller cross-validated misclassification rate. Train an ECOC model and a random forest using the training data. Supply the optimal hyperparameter combinations. t = templateSVM('BoxConstraint',bestBoxConstraint,'Standardize',true); MdlECOC = fitcecoc(X(idxTrn,:),Y(idxTrn),'Learners',t,'Coding',bestCoding); t = templateTree('MaxNumSplits',bestMNS); MdlRF = fitcensemble(X(idxTrn,:),Y(idxTrn),'Method','bag','Learners',t);

Create a variable for the test sample images and use the trained models to predict test sample labels. testImages = X(idxTest,:); testLabelsECOC = predict(MdlECOC,testImages); testLabelsRF = predict(MdlRF,testImages);

Save Classification Model to Disk MdlECOC and MdlRF are predictive classification models, but you must prepare them for code generation. Save MdlECOC and MdlRF to your present working folder using saveLearnerForCoder. saveLearnerForCoder(MdlECOC,'DigitImagesECOC'); saveLearnerForCoder(MdlRF,'DigitImagesRF');

Create System Object for Prediction Create two System objects, one for the ECOC model and the other for the random forest, that: 31-49

31

Code Generation

• Load the previously saved trained model by using loadLearnerForCoder. • Make sequential predictions by the step method. • Enforce no size changes to the input data. • Enforce double-precision, scalar output. type ECOCClassifier.m % Display contents of ECOCClassifier.m file classdef ECOCClassifier < matlab.System % ECOCCLASSIFIER Predict image labels from trained ECOC model % % ECOCCLASSIFIER loads the trained ECOC model from % |'DigitImagesECOC.mat'|, and predicts labels for new observations % based on the trained model. The ECOC model in % |'DigitImagesECOC.mat'| was cross-validated using the training data % in the sample data |digitimages.mat|. properties(Access = private) CompactMdl % The compacted, trained ECOC model end methods(Access = protected) function setupImpl(obj) % Load ECOC model from file obj.CompactMdl = loadLearnerForCoder('DigitImagesECOC'); end function y = stepImpl(obj,u) y = predict(obj.CompactMdl,u); end function flag = isInputSizeMutableImpl(obj,index) % Return false if input size is not allowed to change while % system is running flag = false; end function dataout = getOutputDataTypeImpl(~) dataout = 'double'; end function sizeout = getOutputSizeImpl(~) sizeout = [1 1]; end end end type RFClassifier.m % Display contents of RFClassifier.m file classdef RFClassifier < matlab.System % RFCLASSIFIER Predict image labels from trained random forest % % RFCLASSIFIER loads the trained random forest from % |'DigitImagesRF.mat'|, and predicts labels for new observations based % on the trained model. The random forest in |'DigitImagesRF.mat'| % was cross-validated using the training data in the sample data % |digitimages.mat|.

31-50

System Objects for Classification and Code Generation

properties(Access = private) CompactMdl % The compacted, trained random forest end methods(Access = protected) function setupImpl(obj) % Load random forest from file obj.CompactMdl = loadLearnerForCoder('DigitImagesRF'); end function y = stepImpl(obj,u) y = predict(obj.CompactMdl,u); end function flag = isInputSizeMutableImpl(obj,index) % Return false if input size is not allowed to change while % system is running flag = false; end function dataout = getOutputDataTypeImpl(~) dataout = 'double'; end function sizeout = getOutputSizeImpl(~) sizeout = [1 1]; end end end

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB® opens the example folder. This folder includes the files used in this example. For System object basic requirements, see “Define Basic System Objects” (MATLAB). Define Prediction Functions for Code Generation Define two MATLAB functions called predictDigitECOCSO.m and predictDigitRFSO.m. The functions: • Include the code generation directive %#codegen. • Accept image data commensurate with X. • Predict labels using the ECOCClassifier and RFClassifier System objects, respectively. • Return predicted labels. type predictDigitECOCSO.m % Display contents of predictDigitECOCSO.m file function label = predictDigitECOCSO(X) %#codegen %PREDICTDIGITECOCSO Classify digit in image using ECOC Model System object % PREDICTDIGITECOCSO classifies the 28-by-28 images in the rows of X % using the compact ECOC model in the System object ECOCClassifier, and % then returns class labels in label. classifier = ECOCClassifier;

31-51

31

Code Generation

label = step(classifier,X); end type predictDigitRFSO.m % Display contents of predictDigitRFSO.m file function label = predictDigitRFSO(X) %#codegen %PREDICTDIGITRFSO Classify digit in image using RF Model System object % PREDICTDIGITRFSO classifies the 28-by-28 images in the rows of X % using the compact random forest in the System object RFClassifier, and % then returns class labels in label. classifier = RFClassifier; label = step(classifier,X); end

Compile MATLAB Function to MEX File Compile the prediction function that achieves better test-sample accuracy to a MEX file by using codegen. Specify the test set images by using the -args argument. if(minCVLossECOC F

The p-value, which is the probability that the Fstatistic can take a value larger than the computed test-statistic value. anova1 derives this probability from the cdf of F-distribution.

The rows of the ANOVA table show the variability in the data that is divided by the source. Row

Definition

Groups

Variability due to the differences among the group means (variability between groups)

Error

Variability due to the differences between the data in each group and the group mean (variability within groups)

Total

Total variability

stats — Statistics for multiple comparison tests structure Statistics for multiple comparison tests on page 9-18, returned as a structure with the fields described in this table.

32-62

Field name

Definition

gnames

Names of the groups

n

Number of observations in each group

source

Source of the stats output

means

Estimated values of the means

df

Error (within-groups) degrees of freedom (N – k, where N is the total number of observations and k is the number of groups)

s

Square root of the mean squared error

anova1

More About Box Plot anova1 returns a box plot of the observations for each group in y. Box plots provide a visual comparison of the group location parameters. On each box, the central mark is the median (2nd quantile, q2) and the edges of the box are the 25th and 75th percentiles (1st and 3rd quantiles, q1 and q3, respectively). The whiskers extend to the most extreme data points that are not considered outliers. The outliers are plotted individually using the '+' symbol. The extremes of the whiskers correspond to q3 + 1.5 × (q3 – q1) and q1 – 1.5 × (q3 – q1). Box plots include notches for the comparison of the median values. Two medians are significantly different at the 5% significance level if their intervals, represented by notches, do not overlap. This test is different from the F-test that ANOVA performs; however, large differences in the center lines of the boxes correspond to a large F-statistic value and correspondingly a small p-value. The extremes of the notches correspond to q2 – 1.57(q3 – q1)/sqrt(n) and q2 + 1.57(q3 – q1)/sqrt(n), where n is the number of observations without any NaN values. For more information about box plots, see 'Whisker' and 'Notch' of boxplot.

References [1] Hogg, R. V., and J. Ledolter. Engineering Statistics. New York: MacMillan, 1987.

See Also anova2 | anovan | boxplot | multcompare Topics “Perform One-Way ANOVA” on page 9-5 “One-Way ANOVA” on page 9-3 “Multiple Comparisons” on page 9-18 Introduced before R2006a

32-63

32

Functions

anova2 Two-way analysis of variance

Syntax p = anova2(y,reps) p = anova2(y,reps,displayopt) [p,tbl] = anova2( ___ ) [p,tbl,stats] = anova2( ___ )

Description anova2 performs two-way analysis of variance (ANOVA) with balanced designs. To perform two-way ANOVA with unbalanced designs, see anovan. p = anova2(y,reps) returns the p-values for a balanced two-way ANOVA for comparing the means of two or more columns and two or more rows of the observations in y. reps is the number of replicates for each combination of factor groups, which must be constant, indicating a balanced design. For unbalanced designs, use anovan. The anova2 function tests the main effects for column and row factors and their interaction effect. To test the interaction effect, reps must be greater than 1. anova2 also displays the standard ANOVA table. p = anova2(y,reps,displayopt) enables the ANOVA table display when displayopt is 'on' (default) and suppresses the display when displayopt is 'off'. [p,tbl] = anova2( ___ ) returns the ANOVA table (including column and row labels) in cell array tbl. To copy a text version of the ANOVA table to the clipboard, select Edit > Copy Text menu. [p,tbl,stats] = anova2( ___ ) returns a stats structure, which you can use to perform a multiple comparison test on page 9-18. A multiple comparison test enables you to determine which pairs of group means are significantly different. To perform this test, use multcompare, providing the stats structure as input.

Examples Two-Way ANOVA Load the sample data. load popcorn popcorn popcorn = 6×3 5.5000 5.5000 6.0000

32-64

4.5000 4.5000 4.0000

3.5000 4.0000 3.0000

anova2

6.5000 7.0000 7.0000

5.0000 5.5000 5.0000

4.0000 5.0000 4.5000

The data is from a study of popcorn brands and popper types (Hogg 1987). The columns of the matrix popcorn are brands, Gourmet, National, and Generic, respectively. The rows are popper types, oil and air. In the study, researchers popped a batch of each brand three times with each popper, that is, the number of replications is 3. The first three rows correspond to the oil popper, and the last three rows correspond to the air popper. The response values are the yield in cups of popped popcorn. Perform a two-way ANOVA. Save the ANOVA table in the cell array tbl for easy access to results. [p,tbl] = anova2(popcorn,3);

The column Prob>F shows the p-values for the three brands of popcorn (0.0000), the two popper types (0.0001), and the interaction between brand and popper type (0.7462). These values indicate that popcorn brand and popper type affect the yield of popcorn, but there is no evidence of an interaction effect of the two. Display the cell array containing the ANOVA table. tbl tbl=6×6 cell array Columns 1 through 5 {'Source' } {'Columns' } {'Rows' } {'Interaction'} {'Error' } {'Total' }

{'SS' } {[15.7500]} {[ 4.5000]} {[ 0.0833]} {[ 1.6667]} {[ 22]}

{'df'} {[ 2]} {[ 1]} {[ 2]} {[12]} {[17]}

{'MS' } {[ 7.8750]} {[ 4.5000]} {[ 0.0417]} {[ 0.1389]} {0x0 double}

{'F' } {[ 56.7000]} {[ 32.4000]} {[ 0.3000]} {0x0 double} {0x0 double}

Column 6 {'Prob>F' } {[7.6790e-07]} {[1.0037e-04]} {[ 0.7462]} {0x0 double } {0x0 double }

Store the F-statistic for the factors and factor interaction in separate variables. 32-65

32

Functions

Fbrands = tbl{2,5} Fbrands = 56.7000 Fpoppertype = tbl{3,5} Fpoppertype = 32.4000 Finteraction = tbl{4,5} Finteraction = 0.3000

Multiple Comparisons for Two-Way ANOVA Load the sample data. load popcorn popcorn popcorn = 6×3 5.5000 5.5000 6.0000 6.5000 7.0000 7.0000

4.5000 4.5000 4.0000 5.0000 5.5000 5.0000

3.5000 4.0000 3.0000 4.0000 5.0000 4.5000

The data is from a study of popcorn brands and popper types (Hogg 1987). The columns of the matrix popcorn are brands (Gourmet, National, and Generic). The rows are popper types oil and air. In the study, researchers popped a batch of each brand three times with each popper. The values are the yield in cups of popped popcorn. Perform a two-way ANOVA. Also compute the statistics that you need to perform a multiple comparison test on the main effects. [~,~,stats] = anova2(popcorn,3,'off') stats = struct with fields: source: 'anova2' sigmasq: 0.1389 colmeans: [6.2500 4.7500 4] coln: 6 rowmeans: [4.5000 5.5000] rown: 9 inter: 1 pval: 0.7462 df: 12

The stats structure includes • The mean squared error (sigmasq) • The estimates of the mean yield for each popcorn brand (colmeans) 32-66

anova2

• The number of observations for each popcorn brand (coln) • The estimate of the mean yield for each popper type (rowmeans) • The number of observations for each popper type (rown) • The number of interactions (inter) • The p-value that shows the significance level of the interaction term (pval) • The error degrees of freedom (df). Perform a multiple comparison test to see if the popcorn yield differs between pairs of popcorn brands (columns). c = multcompare(stats) Note: Your model includes an interaction term. A test of main effects can be difficult to interpret when the model includes interactions.

c = 3×6 1.0000 1.0000 2.0000

2.0000 3.0000 3.0000

0.9260 1.6760 0.1760

1.5000 2.2500 0.7500

2.0740 2.8240 1.3240

0.0000 0.0000 0.0116

The first two columns of c show the groups that are compared. The fourth column shows the difference between the estimated group means. The third and fifth columns show the lower and upper limits for 95% confidence intervals for the true mean difference. The sixth column contains the p32-67

32

Functions

value for a hypothesis test that the corresponding mean difference is equal to zero. All p-values (0, 0, and 0.0116) are very small, which indicates that the popcorn yield differs across all three brands. The figure shows the multiple comparison of the means. By default, the group 1 mean is highlighted and the comparison interval is in blue. Because the comparison intervals for the other two groups do not intersect with the intervals for the group 1 mean, they are highlighted in red. This lack of intersection indicates that both means are different than group 1 mean. Select other group means to confirm that all group means are significantly different from each other. Perform a multiple comparison test to see the popcorn yield differs between the two popper types (rows). c = multcompare(stats,'Estimate','row') Note: Your model includes an interaction term. A test of main effects can be difficult to interpret when the model includes interactions.

c = 1×6 1.0000

2.0000

-1.3828

-1.0000

-0.6172

0.0001

The small p-value of 0.0001 indicates that the popcorn yield differs between the two popper types (air and oil). The figure shows the same results. The disjoint comparison intervals indicate that the group means are significantly different from each other.

32-68

anova2

Input Arguments y — Sample data matrix Sample data, specified as a matrix. The columns correspond to groups of one factor, and the rows correspond to the groups of the other factor and the replications. Replications are the measurements or observations for each combination of groups (levels) of the row and column factor. For example, in the following data the row factor A has three levels, column factor B has two levels, and there are two replications (reps = 2). The subscripts indicate row, column, and replication, respectively. B=1 B=2 y111 y121 y112 y122 y211 y221 y212 y222

A=1 A=2 A=3

y311 y321 y312 y322 Data Types: single | double reps — Number of replications 1 (default) | an integer number Number of replications for each combination of groups, specified as an integer number. For example, the following data has two replications (reps = 2) for each group combination of row factor A and column factor B. B=1 B=2 y111 y121 y112 y122 y211 y221 y212 y222

A=1 A=2 A=3

y311 y321 y312 y322 • When reps is 1 (default), anova2 returns two p-values in vector p: • The p-value for the null hypothesis that all samples from factor B (i.e., all column samples in y) are drawn from the same population. • The p-value for the null hypothesis, that all samples from factor A (i.e., all row samples in y) are drawn from the same population. • When reps is greater than 1, anova2 also returns the p-value for the null hypothesis that factors A and B have no interaction (i.e., the effects due to factors A and B are additive). Example: p = anova(y,3) specifies that each combination of groups (levels) has three replications. Data Types: single | double 32-69

32

Functions

displayopt — Indicator to display the ANOVA table 'on' (default) | 'off' Indicator to display the ANOVA table as a figure, specified as 'on' or 'off'.

Output Arguments p — p-value scalar value p-value for the F-test, returned as a scalar value. A small p-value indicates that the results are statistically significant. Common significance levels are 0.05 or 0.01. For example: • A sufficiently small p-value for the null hypothesis for group means of row factor A suggests that at least one row-sample mean is significantly different from the other row-sample means; i.e., there is a main effect due to factor A • A sufficiently small p-value for the null hypothesis for group (level) means of column factor B suggests that at least one column-sample mean is significantly different from the other columnsample means; i.e., there is a main effect due to factor B. • A sufficiently small p-value for combinations of groups (levels) of factors A and B suggests that there is an interaction between factors A and B. tbl — ANOVA table cell array ANOVA table, returned as a cell array. tbl has six columns. Column name

Definition

source

Source of the variability.

SS

Sum of squares due to each source.

df

Degrees of freedom associated with each source.

MS

Mean squares for each source, which is the ratio SS/df.

F

F-statistic, which is the ratio of the mean squares.

Prob>F

p-value, which is the probability that the Fstatistic can take a value larger than the computed test-statistic value. anova2 derives this probability from the cdf of the F-distribution.

The rows of the ANOVA table show the variability in the data, divided by the source into three or four parts, depending on the value of reps.

32-70

Row

Definition

Columns

Variability due to the differences among the column means

Rows

Variability due to the differences among the row means

anova2

Row

Definition

Interaction

Variability due to the interaction between rows and columns (if reps is greater than its default value of 1)

Error

Remaining variability not explained by any systematic source

Data Types: cell stats — Statistics for multiple comparison test structure Statistics for multiple comparisons tests on page 9-18, returned as a structure. Use multcompare to perform multiple comparison tests, supplying stats as an input argument. stats has nine fields. Field

Definition

source

Source of the stats output

sigmasq

Mean squared error

colmeans

Estimated values of the column means

coln

Number of observations for each group in columns

rowmeans

Estimated values of the row means

rown

Number of observations for each group in rows

inter

Number of interactions

pval

p-value for the interaction term

df

Error degrees of freedom (reps — 1)*r*c where reps is the number of replications and c and r are the number of groups in factors, respectively.

Data Types: struct

References [1] Hogg, R. V., and J. Ledolter. Engineering Statistics. New York: MacMillan, 1987.

See Also anova1 | anovan | multcompare Topics “Perform Two-Way ANOVA” on page 9-13 “Two-Way ANOVA” on page 9-11 “Multiple Comparisons” on page 9-18 Introduced before R2006a

32-71

32

Functions

anovan N-way analysis of variance

Syntax p = anovan(y,group) p = anovan(y,group,Name,Value) [p,tbl] = anovan( ___ ) [p,tbl,stats] = anovan( ___ ) [p,tbl,stats,terms] = anovan( ___ )

Description p = anovan(y,group) returns a vector of p-values, one per term, for multiway (n-way) analysis of variance (ANOVA) for testing the effects of multiple factors on the mean of the vector y. anovan also displays a figure showing the standard ANOVA table. p = anovan(y,group,Name,Value) returns a vector of p-values for multiway (n-way) ANOVA using additional options specified by one or more Name,Value pair arguments. For example, you can specify which predictor variable is continuous, if any, or the type of sum of squares to use. [p,tbl] = anovan( ___ ) returns the ANOVA table (including factor labels) in cell array tbl for any of the input arguments specified in the previous syntaxes. Copy a text version of the ANOVA table to the clipboard by using the Copy Text item on the Edit menu. [p,tbl,stats] = anovan( ___ ) returns a stats structure that you can use to perform a multiple comparison test on page 9-18, which enables you to determine which pairs of group means are significantly different. You can perform such a test using the multcompare function by providing the stats structure as input. [p,tbl,stats,terms] = anovan( ___ ) returns the main and interaction terms used in the ANOVA computations in terms.

Examples Three-Way ANOVA Load the sample data. y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]'; g1 = [1 2 1 2 1 2 1 2]; g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'}; g3 = {'may';'may';'may';'may';'june';'june';'june';'june'};

y is the response vector and g1, g2, and g3 are the grouping variables (factors). Each factor has two levels ,and every observation in y is identified by a combination of factor levels. For example, 32-72

anovan

observation y(1) is associated with level 1 of factor g1, level 'hi' of factor g2, and level 'may' of factor g3. Similarly, observation y(6) is associated with level 2 of factor g1, level 'hi' of factor g2, and level 'june' of factor g3. Test if the response is the same for all factor levels. p = anovan(y,{g1,g2,g3})

p = 3×1 0.4174 0.0028 0.9140

In the ANOVA table, X1, X2, and X3 correspond to the factors g1, g2, and g3, respectively. The pvalue 0.4174 indicates that the mean responses for levels 1 and 2 of the factor g1 are not significantly different. Similarly, the p-value 0.914 indicates that the mean responses for levels 'may' and 'june', of the factor g3 are not significantly different. However, the p-value 0.0028 is small enough to conclude that the mean responses are significantly different for the two levels, 'hi' and 'lo' of the factor g2. By default, anovan computes p-values just for the three main effects. Test the two-factor interactions. This time specify the variable names. p = anovan(y,{g1 g2 g3},'model','interaction','varnames',{'g1','g2','g3'})

p = 6×1

32-73

32

Functions

0.0347 0.0048 0.2578 0.0158 0.1444 0.5000

The interaction terms are represented by g1*g2, g1*g3, and g2*g3 in the ANOVA table. The first three entries of p are the p-values for the main effects. The last three entries are the p-values for the two-way interactions. The p-value of 0.0158 indicates that the interaction between g1 and g2 is significant. The p-values of 0.1444 and 0.5 indicate that the corresponding interactions are not significant.

Two-Way ANOVA for Unbalanced Design Load the sample data. load carbig

The data has measurements on 406 cars. The variable org shows where the cars were made and when shows when in the year the cars were manufactured. Study how the mileage depends on when and where the cars were made. Also include the two-way interactions in the model. p = anovan(MPG,{org when},'model',2,'varnames',{'origin','mfg date'})

p = 3×1 0.0000 0.0000 0.3059

The 'model',2 name-value pair argument represents the two-way interactions. The p-value for the interaction term, 0.3059, is not small, indicating little evidence that the effect of the time of manufacture (mfg date) depends on where the car was made (origin). The main effects of origin and manufacturing date, however, are significant, both p-values are 0.

32-74

anovan

Multiple Comparisons for Three-Way ANOVA Load the sample data. y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]'; g1 = [1 2 1 2 1 2 1 2]; g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'}; g3 = {'may';'may';'may';'may';'june';'june';'june';'june'};

y is the response vector and g1, g2, and g3 are the grouping variables (factors). Each factor has two levels, and every observation in y is identified by a combination of factor levels. For example, observation y(1) is associated with level 1 of factor g1, level 'hi' of factor g2, and level 'may' of factor g3. Similarly, observation y(6) is associated with level 2 of factor g1, level 'hi' of factor g2, and level 'june' of factor g3. Test if the response is the same for all factor levels. Also compute the statistics required for multiple comparison tests. [~,~,stats] = anovan(y,{g1 g2 g3},'model','interaction',... 'varnames',{'g1','g2','g3'});

The p-value of 0.2578 indicates that the mean responses for levels 'may' and 'june' of factor g3 are not significantly different. The p-value of 0.0347 indicates that the mean responses for levels 1 and 2 of factor g1 are significantly different. Similarly, the p-value of 0.0048 indicates that the mean responses for levels 'hi' and 'lo' of factor g2 are significantly different. Perform multiple comparison tests to find out which groups of the factors g1 and g2 are significantly different. results = multcompare(stats,'Dimension',[1 2])

32-75

32

Functions

results = 6×6 1.0000 1.0000 1.0000 2.0000 2.0000 3.0000

2.0000 3.0000 4.0000 3.0000 4.0000 4.0000

-6.8604 4.4896 6.1396 8.8896 10.5396 -0.8104

-4.4000 6.9500 8.6000 11.3500 13.0000 1.6500

-1.9396 9.4104 11.0604 13.8104 15.4604 4.1104

0.0280 0.0177 0.0143 0.0108 0.0095 0.0745

multcompare compares the combinations of groups (levels) of the two grouping variables, g1 and g2. In the results matrix, the number 1 corresponds to the combination of level 1 of g1 and level hi of g2, the number 2 corresponds to the combination of level 2 of g1 and level hi of g2. Similarly, the number 3 corresponds to the combination of level 1 of g1 and level lo of g2, and the number 4 corresponds to the combination of level 2 of g1 and level lo of g2. The last column of the matrix contains the p-values. For example, the first row of the matrix shows that the combination of level 1 of g1 and level hi of g2 has the same mean response values as the combination of level 2 of g1 and level hi of g2. The pvalue corresponding to this test is 0.0280, which indicates that the mean responses are significantly different. You can also see this result in the figure. The blue bar shows the comparison interval for the mean response for the combination of level 1 of g1 and level hi of g2. The red bars are the comparison intervals for the mean response for other group combinations. None of the red bars overlap with the blue bar, which means the mean response for the combination of level 1 of g1 and level hi of g2 is significantly different from the mean response for other group combinations.

32-76

anovan

You can test the other groups by clicking on the corresponding comparison interval for the group. The bar you click on turns to blue. The bars for the groups that are significantly different are red. The bars for the groups that are not significantly different are gray. For example, if you click on the comparison interval for the combination of level 1 of g1 and level lo of g2, the comparison interval for the combination of level 2 of g1 and level lo of g2 overlaps, and is therefore gray. Conversely, the other comparison intervals are red, indicating significant difference.

Input Arguments y — Sample data numeric vector Sample data, specified as a numeric vector. Data Types: single | double group — Grouping variables cell array Grouping variables, i.e. the factors and factor levels of the observations in y, specified as a cell array. Each of the cells in group contains a list of factor levels identifying the observations in y with respect to one of the factors. The list within each cell can be a categorical array, numeric vector, character matrix, string array, or single-column cell array of character vectors, and must have the same number of elements as y. y2,

y3,

y4,

y5, ⋯, yN ]′

′A′, ′A′,

′C′,

′B′,

′B′, ⋯, ′D′

1

3

y = [ y1, g1 = g2 = [ g3 =

1

2

1 ⋯,

2

]

′hi′, ′mid′, ′low′, ′mid′, ′hi′, ⋯, ′low′

By default, anovan treats all grouping variables as fixed effects. For example, in a study you want to investigate the effects of gender, school, and the education method on the academic success of elementary school students, then you can specify the grouping variables as follows. Example: {'Gender','School','Method'} Data Types: cell Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'alpha',0.01,'model','interaction','sstype',2 specifies anovan to compute the 99% confidence bounds and p-values for the main effects and two-way interactions using type II sum of squares. alpha — Significance level 0.05 (default) | scalar value in the range 0 to 1 32-77

32

Functions

Significance level for confidence bounds, specified as the comma-separated pair consisting of'alpha' and a scalar value in the range 0 to 1. For a value α, the confidence level is 100*(1–α)%. Example: 'alpha',0.01 corresponds to 99% confidence intervals Data Types: single | double continuous — Indicator for continuous predictors vector of indices Indicator for continuous predictors, representing which grouping variables should be treated as continuous predictors rather than as categorical predictors, specified as the comma-separated pair consisting of 'continuous' and a vector of indices. For example, if there are three grouping variables and second one is continuous, then you can specify as follows. Example: 'continuous',[2] Data Types: single | double display — Indicator to display ANOVA table 'on' (default) | 'off' Indicator to display ANOVA table, specified as the comma-separated pair consisting of 'display' and 'on' or 'off'. When displayopt is 'off', anovan only returns the output arguments, and does not display the standard ANOVA table as a figure. Example: 'display','off' model — Type of the model 'linear' (default) | 'interaction' | 'full' | integer value | terms matrix Type of the model, specified as the comma-separated pair consisting of 'model' and one of the following: • 'linear' — The default 'linear' model computes only the p-values for the null hypotheses on the N main effects. • 'interaction' — The 'interaction' model computes the p-values for null hypotheses on the N two-factor interactions. N main effects and the 2 • 'full' — The 'full' model computes the p-values for null hypotheses on the N main effects and interactions at all levels. • An integer — For an integer value of k, (k ≤ N) for model type, anovan computes all interaction levels through the kth level. For example, the value 3 means main effects plus two- and threefactor interactions. The values k = 1 and k = 2 are equivalent to the 'linear' and 'interaction' specifications, respectively. The value k = N is equivalent to the 'full' specification. • Terms matrix — A matrix of term definitions having the same form as the input to the x2fx function. All entries must be 0 or 1 (no higher powers). For more precise control over the main and interaction terms that anovan computes, you can specify a matrix containing one row for each main or interaction term to include in the ANOVA model. Each row defines one term using a vector of N zeros and ones. The table below illustrates the coding for a 3-factor ANOVA for factors A, B, and C. 32-78

anovan

Matrix Row

ANOVA Term

[1 0 0]

Main term A

[0 1 0]

Main term B

[0 0 1]

Main term C

[1 1 0]

Interaction term AB

[1 0 1]

Interaction term AC

[0 1 1]

Interaction term BC

[1 1 1]

Interaction term ABC

For example, if there are three factors A, B, and C, and 'model',[0 1 0;0 0 1;0 1 1], then anovan tests for the main effects B and C, and the interaction effect BC, respectively. A simple way to generate the terms matrix is to modify the terms output, which codes the terms in the current model using the format described above. If anovan returns [0 1 0;0 0 1;0 1 1] for terms, for example, and there is no significant interaction BC, then you can recompute ANOVA on just the main effects B and C by specifying [0 1 0;0 0 1] for model. Example: 'model',[0 1 0;0 0 1;0 1 1] Example: 'model','interaction' Data Types: char | string | single | double nested — Nesting relationships matrix of 0’s and 1’s Nesting relationships among the grouping variables, specified as the comma-separated pair consisting of 'nested' and a matrix M of 0’s and 1’s, i.e.M(i,j) = 1 if variable i is nested in variable j. You cannot specify nesting in a continuous variable. For example, if there are two grouping variables District and School, where School is nested in District, then you can express this relationship as follows. Example: 'nested',[0 0;1 0] Data Types: single | double random — Indicator for random variables vector of indices Indicator for random variables, representing which grouping variables are random, specified as the comma-separated pair consisting of 'random' and a vector of indices. By default, anovan treats all grouping variables as fixed. anovan treats an interaction term as random if any of the variables in the interaction term is random. Example: 'random',[3] Data Types: single | double sstype — Type of sum of squares 3 (default) | 1 | 2 | 'h' 32-79

32

Functions

Type of sum squares, specified as the comma-separated pair consisting of 'sstype' and the following: • 1 — Type I sum of squares. The reduction in residual sum of squares obtained by adding that term to a fit that already includes the terms listed before it. • 2 — Type II sum of squares. The reduction in residual sum of squares obtained by adding that term to a model consisting of all other terms that do not contain the term in question. • 3 — Type III sum of squares. The reduction in residual sum of squares obtained by adding that term to a model containing all other terms, but with their effects constrained to obey the usual “sigma restrictions” that make models estimable. • 'h' — Hierarchical model. Similar to type 2, but with continuous as well as categorical factors used to determine the hierarchy of terms. The sum of squares for any term is determined by comparing two models. For a model containing main effects but no interactions, the value of sstype influences the computations on unbalanced data only. Suppose you are fitting a model with two factors and their interaction, and the terms appear in the order A, B, AB. Let R(·) represent the residual sum of squares for the model. So, R(A, B, AB) is the residual sum of squares fitting the whole model, R(A) is the residual sum of squares fitting the main effect of A only, and R(1) is the residual sum of squares fitting the mean only. The three sum of squares types are as follows: Term

Type 1 Sum of Squares

Type 2 Sum of Squares

Type 3 Sum of Squares

A

R(1) – R(A)

R(B) – R(A, B)

R(B, AB) – R(A, B, AB)

B

R(A) – R(A, B)

R(A) – R(A, B)

R(A, AB) – R(A, B, AB)

AB

R(A, B) – R(A, B, AB)

R(A, B) – R(A, B, AB)

R(A, B) – R(A, B, AB)

The models for Type 3 sum of squares have sigma restrictions imposed. This means, for example, that in fitting R(B, AB), the array of AB effects is constrained to sum to 0 over A for each value of B, and over B for each value of A. Example: 'sstype','h' Data Types: single | double | char | string varnames — Names of grouping variables X1,X2,...,XN (default) | character matrix | string array | cell array of character vectors Names of grouping variables, specified as the comma-separating pair consisting of 'varnames' and a character matrix, a string array, or a cell array of character vectors. Example: 'varnames',{'Gender','City'} Data Types: char | string | cell

Output Arguments p — p-values vector p-values, returned as a vector. 32-80

anovan

Output vector p contains p-values for the null hypotheses on the N main effects and any interaction terms specified. Element p(1) contains the p-value for the null hypotheses that samples at all levels of factor A are drawn from the same population; element p(2) contains the p-value for the null hypotheses that samples at all levels of factor B are drawn from the same population; and so on. For example, if there are three factors A, B, and C, and 'model',[0 1 0;0 0 1;0 1 1], then the output vector p contains the p-values for the null hypotheses on the main effects B and C and the interaction effect BC, respectively. A sufficiently small p-value corresponding to a factor suggests that at least one group mean is significantly different from the other group means; that is, there is a main effect due to that factor. It is common to declare a result significant if the p-value is less than 0.05 or 0.01. tbl — ANOVA table cell array ANOVA table, returned as a cell array. The ANOVA table has seven columns: Column name

Definition

source

Source of the variability.

SS

Sum of squares due to each source.

df

Degrees of freedom associated with each source.

MS

Mean squares for each source, which is the ratio SS/df.

Singular?

Indication of whether the term is singular.

F

F-statistic, which is the ratio of the mean squares.

Prob>F

The p-values, which is the probability that the Fstatistic can take a value larger than a computed test-statistic value. anovan derives these probabilities from the cdf of F-distribution.

The ANOVA table also contains the following columns if at least one of the grouping variables is specified as random using the name-value pair argument random: Column name

Definition

Type

Type of each source; 'fixed' for a fixed effect or 'random' for a random effect.

Expected MS

Text representation of the expected value for the mean square. Q(source) represents a quadratic function of source and V(source) represents the variance of source.

MS denom

Denominator of the F-statistic.

d.f. denom

Degrees of freedom for the denominator of the Fstatistic.

Denom. defn.

Text representation of the denominator of the Fstatistic. MS(source) represents the mean square of source.

32-81

32

Functions

Column name

Definition

Var. est.

Variance component estimate.

Var. lower bnd

Lower bound of the 95% confidence interval for the variance component estimate.

Var. upper bnd

Upper bound of the 95% confidence interval for the variance component estimate.

stats — Statistics structure Statistics to use in a multiple comparison test on page 9-18 using the multcompare function, returned as a structure. anovan evaluates the hypothesis that the different groups (levels) of a factor (or more generally, a term) have the same effect, against the alternative that they do not all have the same effect. Sometimes it is preferable to perform a test to determine which pairs of levels are significantly different, and which are not. Use the multcompare function to perform such tests by supplying the stats structure as input. The stats structure contains the fields listed below, in addition to a number of other fields required for doing multiple comparisons using the multcompare function: Field

Description

coeffs

Estimated coefficients

coeffnames

Name of term for each coefficient

vars

Matrix of grouping variable values for each term

resid

Residuals from the fitted model

The stats structure also contains the following fields if at least one of the grouping variables is specified as random using the name-value pair argument random: Field

Description

ems

Expected mean squares

denom

Denominator definition

rtnames

Names of random terms

varest

Variance component estimates (one per random term)

varci

Confidence intervals for variance components

terms — Main and interaction terms matrix Main and interaction terms, returned as a matrix. The terms are encoded in the output matrix terms using the same format described above for input model. When you specify model itself in this format, the matrix returned in terms is identical.

32-82

anovan

References [1] Dunn, O.J., and V.A. Clark. Applied Statistics: Analysis of Variance and Regression. New York: Wiley, 1974. [2] Goodnight, J.H., and F.M. Speed. Computing Expected Mean Squares. Cary, NC: SAS Institute, 1978. [3] Seber, G. A. F., and A. J. Lee. Linear Regression Analysis. 2nd ed. Hoboken, NJ: Wiley-Interscience, 2003.

See Also anova1 | anova2 | fitrm | multcompare | ranova Topics “Perform N-Way ANOVA” on page 9-27 “ANOVA with Random Effects” on page 9-33 “Multiple Comparisons” on page 9-18 “N-Way ANOVA” on page 9-25 Introduced before R2006a

32-83

32

Functions

anova Class: RepeatedMeasuresModel Analysis of variance for between-subject effects

Syntax anovatbl = anova(rm) anovatbl = anova(rm,'WithinModel',WM)

Description anovatbl = anova(rm) returns the analysis of variance results for the repeated measures model rm. anovatbl = anova(rm,'WithinModel',WM) returns the analysis of variance results it performs using the response or responses specified by the within-subject model WM.

Input Arguments rm — Repeated measures model RepeatedMeasuresModel object Repeated measures model, returned as a RepeatedMeasuresModel object. For properties and methods of this object, see RepeatedMeasuresModel. WM — Within-subject model 'separatemeans' (default) | 'orthogonalcontrasts' | character vector or string scalar defining a model specification | r-by-nc matrix specifying nc contrasts Within-subject model, specified as one of the following: • 'separatemeans' — The response is the average of the repeated measures (average across the within-subject model). • 'orthogonalcontrasts' — This is valid when the within-subject model has a single numeric factor T. Responses are the average, the slope of centered T, and, in general, all orthogonal contrasts for a polynomial up to T^(p – 1), where p is the number of rows in the within-subject model. anova multiplies Y, the response you use in the repeated measures model rm by the orthogonal contrasts, and uses the columns of the resulting product matrix as the responses. anova computes the orthogonal contrasts for T using the Q factor of a QR factorization on page 32-88 of the Vandermonde matrix on page 32-88. • A character vector or string scalar that defines a model specification in the within-subject factors. Responses are defined by the terms in that model. anova multiplies Y, the response matrix you use in the repeated measures model rm by the terms of the model, and uses the columns of the result as the responses. 32-84

anova

For example, if there is a Time factor and 'Time' is the model specification, then anova uses two terms, the constant and the uncentered Time term. The default is '1' to perform on the average response. • An r-by-nc matrix, C, specifying nc contrasts among the r repeated measures. If Y represents the matrix of repeated measures you use in the repeated measures model rm, then the output tbl contains a separate analysis of variance for each column of Y*C. The anova table contains a separate univariate analysis of variance results for each response. Example: 'WithinModel','Time' Example: 'WithinModel','orthogonalcontrasts'

Output Arguments anovatbl — Results of analysis of variance table Results of analysis of variance for between-subject effects, returned as a table. This includes all terms on the between-subjects model and the following columns. Column Name

Definition

Within

Within-subject factors

Between

Between-subject factors

SumSq

Sum of squares

DF

Degrees of freedom

MeanSq

Mean squared error

F

F-statistic

pValue

p-value corresponding to the F-statistic

Examples Analysis of Variance for Average Response Load the sample data. load fisheriris

The column vector species consists of iris flowers of three different species: setosa, versicolor, and virginica. The double matrix meas consists of four types of measurements on the flowers: the length and width of sepals and petals in centimeters, respectively. Store the data in a table array. t = table(species,meas(:,1),meas(:,2),meas(:,3),meas(:,4),... 'VariableNames',{'species','meas1','meas2','meas3','meas4'}); Meas = dataset([1 2 3 4]','VarNames',{'Measurements'});

Fit a repeated measures model where the measurements are the responses and the species is the predictor variable. 32-85

32

Functions

rm = fitrm(t,'meas1-meas4~species','WithinDesign',Meas);

Perform analysis of variance. anova(rm) ans=3×7 table Within ________ Constant Constant Constant

Between ________

SumSq ______

DF ___

MeanSq _______

F ______

pValue ___________

constant species Error

7201.7 309.61 53.875

1 2 147

7201.7 154.8 0.36649

19650 422.39

2.0735e-158 1.1517e-61

There are 150 observations and 3 species. The degrees of freedom for species is 3 - 1 = 2, and for error it is 150 - 3 = 147. The small p-value of 1.1517e-61 indicates that the measurements differ significantly according to species.

Panel Data Load the sample panel data. load('panelData.mat');

The dataset array, panelData, contains yearly observations on eight cities for 6 years. The first variable, Growth, measures economic growth (the response variable). The second and third variables are city and year indicators, respectively. The last variable, Employ, measures employment (the predictor variable). This is simulated data. Store the data in a table array and define city as a nominal variable. t = table(panelData.Growth,panelData.City,panelData.Year,... 'VariableNames',{'Growth','City','Year'});

Convert the data in a proper format to do repeated measures analysis. t = unstack(t,'Growth','Year','NewDataVariableNames',... {'year1','year2','year3','year4','year5','year6'});

Add the mean employment level over the years as a predictor variable to the table t. t(:,8) = table(grpstats(panelData.Employ,panelData.City)); t.Properties.VariableNames{'Var8'} = 'meanEmploy';

Define the within-subjects variable. Year = [1 2 3 4 5 6]';

Fit a repeated measures model, where the growth figures over the 6 years are the responses and the mean employment is the predictor variable. rm = fitrm(t,'year1-year6 ~ meanEmploy','WithinDesign',Year);

Perform analysis of variance. 32-86

anova

anovatbl = anova(rm,'WithinModel',Year) anovatbl=3×7 table Within Between _________ __________ Contrast1 Contrast1 Contrast1

constant meanEmploy Error

SumSq __________

DF __

MeanSq __________

F ________

pValue _________

588.17 3.7064e+05 91675

1 1 6

588.17 3.7064e+05 15279

0.038495 24.258

0.85093 0.0026428

Longitudinal Data Load the sample data. load('longitudinalData.mat');

The matrix Y contains response data for 16 individuals. The response is the blood level of a drug measured at five time points (time = 0, 2, 4, 6, and 8). Each row of Y corresponds to an individual, and each column corresponds to a time point. The first eight subjects are female, and the second eight subjects are male. This is simulated data. Define a variable that stores gender information. Gender = ['F' 'F' 'F' 'F' 'F' 'F' 'F' 'F' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M']';

Store the data in a proper table array format to do repeated measures analysis. t = table(Gender,Y(:,1),Y(:,2),Y(:,3),Y(:,4),Y(:,5),... 'VariableNames',{'Gender','t0','t2','t4','t6','t8'});

Define the within-subjects variable. Time = [0 2 4 6 8]';

Fit a repeated measures model, where blood levels are the responses and gender is the predictor variable. rm = fitrm(t,'t0-t8 ~ Gender','WithinDesign',Time);

Perform analysis of variance. anovatbl = anova(rm) anovatbl=3×7 table Within Between ________ ________ Constant Constant Constant

constant Gender Error

SumSq ______

DF __

MeanSq ______

F ______

pValue __________

54702 2251.7 709.6

1 1 14

54702 2251.7 50.685

1079.2 44.425

1.1897e-14 1.0693e-05

There are 2 genders and 16 observations, so the degrees of freedom for gender is (2 - 1) = 1 and for error it is (16 - 2)*(2 - 1) = 14. The small p-value of 1.0693e-05 indicates that there is a significant effect of gender on blood pressure. 32-87

32

Functions

Repeat analysis of variance using orthogonal contrasts. anovatbl = anova(rm,'WithinModel','orthogonalcontrasts') anovatbl=15×7 table Within Between ________ ________ Constant Constant Constant Time Time Time Time^2 Time^2 Time^2 Time^3 Time^3 Time^3 Time^4 Time^4 Time^4

constant Gender Error constant Gender Error constant Gender Error constant Gender Error constant Gender Error

SumSq __________

DF __

MeanSq __________

F __________

pValue __________

54702 2251.7 709.6 310.83 13.341 140.27 565.42 1.4076 80.039 2.6127 7.8853e-06 25.546 2.8404 2.9016 82.977

1 1 14 1 1 14 1 1 14 1 1 14 1 1 14

54702 2251.7 50.685 310.83 13.341 10.019 565.42 1.4076 5.7171 2.6127 7.8853e-06 1.8247 2.8404 2.9016 5.9269

1079.2 44.425

1.1897e-14 1.0693e-05

31.023 1.3315

6.9065e-05 0.26785

98.901 0.24621

1.0003e-07 0.62746

1.4318 4.3214e-06

0.25134 0.99837

0.47924 0.48956

0.50009 0.49559

More About Vandermonde Matrix Vandermonde matrix is the matrix where columns are the powers of the vector a, that is, V(i,j) = a(i)(n — j) , where n is the length of a. QR Factorization QR factorization of an m-by-n matrix A is the factorization that matrix into the product A = Q*R, where R is an m-by-n upper triangular matrix and Q is an m-by-m unitary matrix.

See Also fitrm | manova | qr | ranova | vander Topics “Model Specification for Repeated Measures Models” on page 9-54

32-88

ansaribradley

ansaribradley Ansari-Bradley test

Syntax h = ansaribradley(x,y) h = ansaribradley(x,y,Name,Value) [h,p] = ansaribradley( ___ ) [h,p,stats] = ansaribradley( ___ )

Description h = ansaribradley(x,y) returns a test decision for the null hypothesis that the data in vectors x and y comes from the same distribution, using the Ansari-Bradley test on page 32-92. The alternative hypothesis is that the data in x and y comes from distributions with the same median and shape but different dispersions (e.g., variances). The result h is 1 if the test rejects the null hypothesis at the 5% significance level, or 0 otherwise. h = ansaribradley(x,y,Name,Value) returns a test decision for the Ansari-Bradley test with additional options specified by one or more name-value pair arguments. For example, you can change the significance level, conduct a one-sided test, or use a normal approximation to calculate the value of the test statistic. [h,p] = ansaribradley( ___ ) also returns the p-value, p, of the test, using any of the input arguments in the previous syntaxes. [h,p,stats] = ansaribradley( ___ ) also returns the structure stats containing information about the test statistic.

Examples Ansari-Bradley Test for Equal Variances Load the sample data. Create data vectors of miles per gallon (MPG) measurements for the model years 1982 and 1976. load carsmall x = MPG(Model_Year==82); y = MPG(Model_Year==76);

Test the null hypothesis that the miles per gallon measured in cars from 1982 and 1976 have equal variances. [h,p,stats] = ansaribradley(x,y) h = 0 p = 0.8426 stats = struct with fields: W: 526.9000

32-89

32

Functions

Wstar: 0.1986

The returned value of h = 0 indicates that ansaribradley does not reject the null hypothesis at the default 5% significance level.

Ansari-Bradley One-Sided Hypothesis Test Load the sample data. Create data vectors of miles per gallon (MPG) measurements for the model years 1982 and 1976. load carsmall x = MPG(Model_Year==82); y = MPG(Model_Year==76);

Test the null hypothesis that the miles per gallon measured in cars from 1982 and 1976 have equal variances, against the alternative hypothesis that the variance of cars from 1982 is greater than that of cars from 1976. [h,p,stats] = ansaribradley(x,y,'Tail','right') h = 0 p = 0.5787 stats = struct with fields: W: 526.9000 Wstar: 0.1986

The returned value of h = 0 indicates that ansaribradley does not reject the null hypothesis that the variance in miles per gallon is the same for the two model years, when the alternative is that the variance of cars from 1982 is greater than that of cars from 1976.

Input Arguments x — Sample data vector | matrix | multidimensional array Sample data, specified as a vector, matrix, or multidimensional array. • If x and y are specified as vectors, they do not need to be the same length. • If x and y are specified as matrices, they must have the same number of columns. ansaribradley performs separate tests along each column and returns a vector of results. • If x and y are specified as multidimensional arrays on page 32-93, ansaribradley works along the first nonsingleton dimension on page 32-93. x and y must have the same size along all remaining dimensions. Data Types: single | double y — Sample data vector | matrix | multidimensional array 32-90

ansaribradley

Sample data, specified as a vector, matrix, or multidimensional array. • If x and y are specified as vectors, they do not need to be the same length. • If x and y are specified as matrices, they must have the same number of columns. ansaribradley performs separate tests along each column and returns a vector of results. • If x and y are specified as multidimensional arrays on page 32-93, ansaribradley works along the first nonsingleton dimension on page 32-93. x and y must have the same size along all remaining dimensions. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Tail','right','Alpha',0.01 specifies a right-tailed hypothesis test at the 1% significance level. Alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level of the hypothesis test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1). Example: 'Alpha',0.01 Data Types: single | double Dim — Dimension first nonsingleton dimension (default) | positive integer value Dimension of the input matrix along which to test the means, specified as the comma-separated pair consisting of 'Dim' and a positive integer value. For example, specifying 'Dim',1 tests the column means, while 'Dim',2 tests the row means. Example: 'Dim',2 Data Types: single | double Tail — Type of alternative hypothesis 'both' (default) | 'left' | 'right' Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of 'Tail' and one of the following. 'both'

Test the alternative hypothesis that the dispersion parameters of x and y are not equal.

'right'

Test the alternative hypothesis that the dispersion parameter of x is greater than that of y.

'left'

Test the alternative hypothesis that the dispersion parameter of x is less than that of y.

Example: 'Tail','right' 32-91

32

Functions

Method — Computation method 'exact' | 'approximate' Computation method for the test statistic, specified as the comma-separated pair consisting of 'Method' and one of the following. 'exact'

Compute p using an exact calculation of the distribution of the test statistic W. This is the default if n, the total number of rows in x and y, is 25 or less. Note that n is computed before any NaN values (representing missing data) are removed.

'approximate'

Compute p using a normal approximation for the statistic W*. This is the default if n, the total number of rows in x and y, is greater than 25.

Example: 'Method','exact'

Output Arguments h — Hypothesis test result 1|0 Hypothesis test result, returned as 1 or 0. • If h = 1, this indicates the rejection of the null hypothesis at the Alpha significance level. • If h = 0, this indicates a failure to reject the null hypothesis at the Alpha significance level. p — p-value scalar value in the range [0,1] p-value of the test, returned as a scalar value in the range [0,1]. p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of p cast doubt on the validity of the null hypothesis. stats — Test statistics structure Test statistics for the Ansari-Bradley test, returned as a structure containing: • W — Value of the test statistic, which is the sum of the Ansari-Bradley ranks for the x sample. • Wstar — Approximate normal statistic W*.

More About Ansari-Bradley Test The Ansari-Bradley test is a nonparametric alternative to the two-sample F-test of equal variances. It does not require the assumption that x and y come from normal distributions. The dispersion of a distribution is generally measured by its variance or standard deviation, but the Ansari-Bradley test can be used with samples from distributions that do not have finite variances. This test requires that the samples have equal medians. Under that assumption, and if the distributions of the samples are continuous and identical, the test is independent of the distributions. If the samples do not have the same medians, the results can be misleading. In that case, Ansari and 32-92

ansaribradley

Bradley recommend subtracting the median, but then the distribution of the resulting test under the null hypothesis is no longer independent of the common distribution of x and y. If you want to perform the tests with medians subtracted, you should subtract the medians from x and y before calling ansaribradley. Multidimensional Array A multidimensional array has more than two dimensions. For example, if x is a 1-by-3-by-4 array, then x is a three-dimensional array. First Nonsingleton Dimension The first nonsingleton dimension is the first dimension of an array whose size is not equal to 1. For example, if x is a 1-by-2-by-3-by-4 array, then the second dimension is the first nonsingleton dimension of x.

See Also ttest2 | vartest2 | vartestn Introduced before R2006a

32-93

32

Functions

aoctool Interactive analysis of covariance

Syntax aoctool(x,y,group) aoctool(x,y,group,alpha) aoctool(x,y,group,alpha,xname,yname,gname) aoctool(x,y,group,alpha,xname,yname,gname,displayopt) aoctool(x,y,group,alpha,xname,yname,gname,displayopt,model) h = aoctool(...) [h,atab,ctab] = aoctool(...) [h,atab,ctab,stats] = aoctool(...)

Description aoctool(x,y,group) fits a separate line to the column vectors, x and y, for each group defined by the values in the array group. group may be a categorical variable, numeric vector, character array, string array, or cell array of character vectors. These types of models are known as one-way analysis of covariance (ANOCOVA) models. The output consists of three figures: • An interactive graph of the data and prediction curves • An ANOVA table • A table of parameter estimates You can use the figures to change models and to test different parts of the model. More information about interactive use of the aoctool function appears in “Analysis of Covariance Tool” on page 9-39. aoctool(x,y,group,alpha) determines the confidence levels of the prediction intervals. The confidence level is 100(1-alpha)%. The default value of alpha is 0.05. aoctool(x,y,group,alpha,xname,yname,gname) specifies the name to use for the x, y, and g variables in the graph and tables. If you enter simple variable names for the x, y, and g arguments, the aoctool function uses those names. If you enter an expression for one of these arguments, you can specify a name to use in place of that expression by supplying these arguments. For example, if you enter m(:,2) as the x argument, you might choose to enter 'Col 2' as the xname argument. aoctool(x,y,group,alpha,xname,yname,gname,displayopt) enables the graph and table displays when displayopt is 'on' (default) and suppresses those displays when displayopt is 'off'. aoctool(x,y,group,alpha,xname,yname,gname,displayopt,model) specifies the initial model to fit. The value of model can be any of the following: • 'same mean' — Fit a single mean, ignoring grouping • 'separate means' — Fit a separate mean to each group • 'same line' — Fit a single line, ignoring grouping • 'parallel lines' — Fit a separate line to each group, but constrain the lines to be parallel 32-94

aoctool

• 'separate lines' — Fit a separate line to each group, with no constraints h = aoctool(...) returns a vector of handles to the line objects in the plot. [h,atab,ctab] = aoctool(...) returns cell arrays containing the entries in ANOVA table (atab) and the table of coefficient estimates (ctab). (You can copy a text version of either table to the clipboard by using the Copy Text item on the Edit menu.) [h,atab,ctab,stats] = aoctool(...) returns a stats structure that you can use to perform a follow-up multiple comparison test. The ANOVA table output includes tests of the hypotheses that the slopes or intercepts are all the same, against a general alternative that they are not all the same. Sometimes it is preferable to perform a test to determine which pairs of values are significantly different, and which are not. You can use the multcompare function to perform such tests by supplying the stats structure as input. You can test either the slopes, the intercepts, or population marginal means (the heights of the curves at the mean x value).

Examples This example illustrates how to fit different models non-interactively. After loading the smaller car data set and fitting a separate-slopes model, you can examine the coefficient estimates. load carsmall [h,a,c,s] = aoctool(Weight,MPG,Model_Year,0.05,... '','','','off','separate lines'); c(:,1:2) ans = 'Term' 'Estimate' 'Intercept' [45.97983716833132] ' 70' [-8.58050531454973] ' 76' [-3.89017396094922] ' 82' [12.47067927549897] 'Slope' [-0.00780212907455] ' 70' [ 0.00195840368824] ' 76' [ 0.00113831038418] ' 82' [-0.00309671407243]

Roughly speaking, the lines relating MPG to Weight have an intercept close to 45.98 and a slope close to -0.0078. Each group's coefficients are offset from these values somewhat. For instance, the intercept for the cars made in 1970 is 45.98-8.58 = 37.40. Next, try a fit using parallel lines. (The ANOVA table shows that the parallel-lines fit is significantly worse than the separate-lines fit.) [h,a,c,s] = aoctool(Weight,MPG,Model_Year,0.05,... '','','','off','parallel lines'); c(:,1:2) ans = 'Term' 'Intercept' ' 70' ' 76' ' 82' 'Slope'

'Estimate' [43.38984085130596] [-3.27948192983761] [-1.35036234809006] [ 4.62984427792768] [-0.00664751826198]

32-95

32

Functions

Again, there are different intercepts for each group, but this time the slopes are constrained to be the same.

See Also anova1 | multcompare | polytool Introduced before R2006a

32-96

append

append Class: TreeBagger Append new trees to ensemble

Syntax B = append(B,other)

Description B = append(B,other) appends the trees from the other ensemble to those in B. This method checks for consistency of the X and Y properties of the two ensembles, as well as consistency of their compact objects and out-of-bag indices, before appending the trees. The output ensemble B takes training parameters such as FBoot, Prior, Cost, and other from the B input. There is no attempt to check if these training parameters are consistent between the two objects.

See Also combine

32-97

32

Functions

barttest Bartlett’s test

Syntax ndim = barttest(x,alpha) [ndim,prob,chisquare] = barttest(x,alpha)

Description ndim = barttest(x,alpha) returns the number of dimensions necessary to explain the nonrandom variation in the data matrix x at the alpha significance level. [ndim,prob,chisquare] = barttest(x,alpha) also returns the significance values for the hypothesis tests prob, and the χ2 values associated with the tests chisquare.

Examples Determine Dimensions Needed to Explain Nonrandom Data Variation Generate a 20-by-6 matrix of random numbers from a multivariate normal distribution with mean mu = [0 0] and covariance sigma = [1 0.99; 0.99 1]. rng default % for reproducibility mu = [0 0]; sigma = [1 0.99; 0.99 1]; X = mvnrnd(mu,sigma,20); % columns 1 and 2 X(:,3:4) = mvnrnd(mu,sigma,20); % columns 3 and 4 X(:,5:6) = mvnrnd(mu,sigma,20); % columns 5 and 6

Determine the number of dimensions necessary to explain the nonrandom variation in data matrix X. Report the significance values for the hypothesis tests. [ndim, prob] = barttest(X,0.05) ndim = 3 prob = 5×1 0.0000 0.0000 0.0000 0.5148 0.3370

The returned value of ndim indicates that three dimensions are necessary to explain the nonrandom variation in X.

32-98

barttest

Input Arguments x — Input data matrix of scalar values Input data, specified as a matrix of scalar values. Data Types: single | double alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level of the hypothesis test, specified as a scalar value in the range (0,1). Example: 0.1 Data Types: single | double

Output Arguments ndim — Number of dimensions positive integer value Number of dimensions, returned as a positive integer value. The dimension is determined by a series of hypothesis tests. The test for ndim = 1 tests the hypothesis that the variances of the data values along each principal component are equal, the test for ndim = 2 tests the hypothesis that the variances along the second through last components are equal, and so on. The null hypothesis is that the number of dimensions is equal to the number of the largest unequal eigenvalues of the covariance matrix of x. prob — Significance value vector of scalar values in the range (0,1) Significance value for the hypothesis tests, returned as a vector of scalar values in the range (0,1). Each element in prob corresponds to an element of chisquare. chisquare — Test statistics vector of scalar values Test statistics for each dimension’s hypothesis test, returned as a vector of scalar values.

See Also Introduced before R2006a

32-99

32

Functions

BayesianOptimization Bayesian optimization results

Description A BayesianOptimization object contains the results of a Bayesian optimization. It is the output of bayesopt or a fit function that accepts the OptimizeHyperparameters name-value pair such as fitcdiscr. In addition, a BayesianOptimization object contains data for each iteration of bayesopt that can be accessed by a plot function or an output function.

Creation Create a BayesianOptimization object using the bayesopt function or a fit function with the OptimizeHyperparameters name-value pair.

Properties Problem Definition Properties ObjectiveFcn — ObjectiveFcn argument used by bayesopt function handle This property is read-only. ObjectiveFcn argument used by bayesopt, returned as a function handle. • If you call bayesopt directly, ObjectiveFcn is the bayesopt objective function argument. • If you call a fit function containing the 'OptimizeHyperparameters' name-value pair argument, ObjectiveFcn is a function handle that returns the misclassification rate for classification or returns the logarithm of one plus the cross-validation loss for regression, measured by five-fold cross-validation. Data Types: function_handle VariableDescriptions — VariableDescriptions argument that bayesopt used vector of optimizableVariable objects This property is read-only. VariableDescriptions argument that bayesopt used, returned as a vector of optimizableVariable objects. • If you called bayesopt directly, VariableDescriptions is the bayesopt variable description argument. • If you called a fit function with the OptimizeHyperparameters name-value pair, VariableDescriptions is the vector of hyperparameters. 32-100

BayesianOptimization

Options — Options that bayesopt used structure This property is read-only. Options that bayesopt used, returned as a structure. • If you called bayesopt directly, Options is the options used in bayesopt, which are the namevalue pairs See bayesopt “Input Arguments” on page 32-124. • If you called a fit function with the OptimizeHyperparameters name-value pair, Options are the default bayesopt options, modified by the HyperparameterOptimizationOptions namevalue pair. Options is a read-only structure containing the following fields. Option Name

Meaning

AcquisitionFunctionName

Acquisition function name. See “Acquisition Function Types” on page 10-3.

IsObjectiveDeterministic

true means the objective function is deterministic, false otherwise.

ExplorationRatio

Used only when AcquisitionFunctionName is 'expected-improvement-plus' or 'expectedimprovement-per-second-plus'. See “Plus” on page 10-5.

MaxObjectiveEvaluations

Objective function evaluation limit.

MaxTime

Time limit.

XConstraintFcn

Deterministic constraints on variables. See “Deterministic Constraints — XConstraintFcn” on page 10-38.

ConditionalVariableFcn

Conditional variable constraints. See “Conditional Constraints — ConditionalVariableFcn” on page 10-39.

NumCoupledConstraints

Number of coupled constraints. See “Coupled Constraints” on page 10-40.

CoupledConstraintTolerances

Coupled constraint tolerances. See “Coupled Constraints” on page 10-40.

AreCoupledConstraintsDeterminis Logical vector specifying whether each coupled tic constraint is deterministic. Verbose

Command-line display level.

OutputFcn

Function called after each iteration. See “Bayesian Optimization Output Functions” on page 10-19.

SaveVariableName

Variable name for the @assignInBase output function.

SaveFileName

File name for the @saveToFile output function.

32-101

32

Functions

Option Name

Meaning

PlotFcn

Plot function called after each iteration. See “Bayesian Optimization Plot Functions” on page 10-11

InitialX

Points where bayesopt evaluated the objective function.

InitialObjective

Objective function values at InitialX.

InitialConstraintViolations

Coupled constraint function values at InitialX.

InitialErrorValues

Error values at InitialX.

InitialObjectiveEvaluationTimes Objective function evaluation times at InitialX. InitialIterationTimes

Time for each iteration, including objective function evaluation and other computations.

Data Types: struct Solution Properties MinObjective — Minimum observed value of objective function real scalar This property is read-only. Minimum observed value of objective function, returned as a real scalar. When there are coupled constraints or evaluation errors, this value is the minimum over all observed points that are feasible according to the final constraint and Error models. Data Types: double XAtMinObjective — Observed point with minimum objective function value 1-by-D table This property is read-only. Observed point with minimum objective function value, returned as a 1-by-D table, where D is the number of variables. Data Types: table MinEstimatedObjective — Minimum estimated value of objective function real scalar This property is read-only. Minimum estimated value of objective function, returned as a real scalar. MinEstimatedObjective uses the final objective model. MinEstimatedObjective is the same as the CriterionValue result of bestPoint with default criterion. Data Types: double XAtMinEstimatedObjective — Point with minimum estimated objective function value 1-by-D table 32-102

BayesianOptimization

This property is read-only. Point with minimum estimated objective function value, returned as a 1-by-D table, where D is the number of variables. XAtMinEstimatedObjective uses the final objective model. Data Types: table NumObjectiveEvaluations — Number of objective function evaluations positive integer This property is read-only. Number of objective function evaluations, returned as a positive integer. This includes the initial evaluations to form a posterior model as well as evaluation during the optimization iterations. Data Types: double TotalElapsedTime — Total elapsed time of optimization in seconds positive scalar This property is read-only. Total elapsed time of optimization in seconds, returned as a positive scalar. Data Types: double NextPoint — Next point to evaluate if optimization continues 1-by-D table This property is read-only. Next point to evaluate if optimization continues, returned as a 1-by-D table, where D is the number of variables. Data Types: table Trace Properties XTrace — Points where the objective function was evaluated T-by-D table This property is read-only. Points where the objective function was evaluated, returned as a T-by-D table, where T is the number of evaluation points and D is the number of variables. Data Types: table ObjectiveTrace — Objective function values column vector of length T This property is read-only. Objective function values, returned as a column vector of length T, where T is the number of evaluation points. ObjectiveTrace contains the history of objective function evaluations. Data Types: double 32-103

32

Functions

ObjectiveEvaluationTimeTrace — Objective function evaluation times column vector of length T This property is read-only. Objective function evaluation times, returned as a column vector of length T, where T is the number of evaluation points. ObjectiveEvaluationTimeTrace includes the time in evaluating coupled constraints, because the objective function computes these constraints. Data Types: double IterationTimeTrace — Iteration times column vector of length T This property is read-only. Iteration times, returned as a column vector of length T, where T is the number of evaluation points. IterationTimeTrace includes both objective function evaluation time and other overhead. Data Types: double ConstraintsTrace — Coupled constraint values T-by-K array This property is read-only. Coupled constraint values, returned as a T-by-K array, where T is the number of evaluation points and K is the number of coupled constraints. Data Types: double ErrorTrace — Error indications column vector of length T of -1 or 1 entries This property is read-only. Error indications, returned as a column vector of length T of -1 or 1 entries, where T is the number of evaluation points. Each 1 entry indicates that the objective function errored or returned NaN on the corresponding point in XTrace. Each -1 entry indicates that the objective function value was computed. Data Types: double FeasibilityTrace — Feasibility indications logical column vector of length T This property is read-only. Feasibility indications, returned as a logical column vector of length T, where T is the number of evaluation points. Each 1 entry indicates that the final constraint model predicts feasibility at the corresponding point in XTrace. Data Types: logical FeasibilityProbabilityTrace — Probability that evaluation point is feasible column vector of length T This property is read-only. 32-104

BayesianOptimization

Probability that evaluation point is feasible, returned as a column vector of length T, where T is the number of evaluation points. The probabilities come from the final constraint model, including the error constraint model, on the corresponding points in XTrace. Data Types: double IndexOfMinimumTrace — Which evaluation gave minimum feasible objective column vector of integer indices of length T This property is read-only. Which evaluation gave minimum feasible objective, returned as a column vector of integer indices of length T, where T is the number of evaluation points. Feasibility is determined with respect to the constraint models that existed at each iteration, including the error constraint model. Data Types: double ObjectiveMinimumTrace — Minimum observed objective column vector of length T This property is read-only. Minimum observed objective, returned as a column vector of integer indices of length T, where T is the number of evaluation points. Data Types: double EstimatedObjectiveMinimumTrace — Minimum estimated objective column vector of length T This property is read-only. Minimum estimated objective, returned as a column vector of integer indices of length T, where T is the number of evaluation points. The estimated objective at each iteration is determined with respect to the objective model that existed at that iteration. Data Types: double UserDataTrace — Auxiliary data from the objective function cell array of length T This property is read-only. Auxiliary data from the objective function, returned as a cell array of length T, where T is the number of evaluation points. Each entry in the cell array is the UserData returned in the third output of the objective function. Data Types: cell

Object Functions bestPoint plot predictConstraints predictError predictObjective predictObjectiveEvaluationTime

Best point in a Bayesian optimization according to a criterion Plot Bayesian optimization results Predict coupled constraint violations at a set of points Predict error value at a set of points Predict objective function at a set of points Predict objective function run times at a set of points 32-105

32

Functions

resume

Resume a Bayesian optimization

Examples Create a BayesianOptimization Object Using bayesopt This example shows how to create a BayesianOptimization object by using bayesopt to minimize cross-validation loss. Optimize hyperparameters of a KNN classifier for the ionosphere data, that is, find KNN hyperparameters that minimize the cross-validation loss. Have bayesopt minimize over the following hyperparameters: • Nearest-neighborhood sizes from 1 to 30 • Distance functions 'chebychev', 'euclidean', and 'minkowski'. For reproducibility, set the random seed, set the partition, and set the AcquisitionFunctionName option to 'expected-improvement-plus'. To suppress iterative display, set 'Verbose' to 0. Pass the partition c and fitting data X and Y to the objective function fun by creating fun as an anonymous function that incorporates this data. See “Parameterizing Functions” (MATLAB). load ionosphere rng default num = optimizableVariable('n',[1,30],'Type','integer'); dst = optimizableVariable('dst',{'chebychev','euclidean','minkowski'},'Type','categorical'); c = cvpartition(351,'Kfold',5); fun = @(x)kfoldLoss(fitcknn(X,Y,'CVPartition',c,'NumNeighbors',x.n,... 'Distance',char(x.dst),'NSMethod','exhaustive')); results = bayesopt(fun,[num,dst],'Verbose',0,... 'AcquisitionFunctionName','expected-improvement-plus')

32-106

BayesianOptimization

32-107

32

Functions

results = BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace: FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

32-108

[function_handle] [1x2 optimizableVariable] [1x1 struct] 0.1197 [1x2 table] 0.1213 [1x2 table] 30 130.1401 [1x2 table] [30x2 table] [30x1 double] [] {30x1 cell} [30x1 double] [30x1 double] [30x1 double] [30x1 logical] [30x1 double] [30x1 double] [30x1 double] [30x1 double]

BayesianOptimization

Create a BayesianOptimization Object Using a Fit Function This example shows how to minimize the cross-validation loss in the ionosphere data using Bayesian optimization of an SVM classifier. Load the data. load ionosphere

Optimize the classification using the 'auto' parameters. rng default % For reproducibility Mdl = fitcsvm(X,Y,'OptimizeHyperparameters','auto')

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelS | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.21652 | 17.227 | 0.21652 | 0.21652 | 64.836 | 0.001 |

2 | Accept |

0.35897 |

0.076634 |

0.21652 |

0.22539 |

0.036335 |

5.

|

3 | Best

|

0.13105 |

6.789 |

0.13105 |

0.14152 |

0.0022147 |

0.002

|

4 | Accept |

0.35897 |

0.076591 |

0.13105 |

0.13108 |

5.1259 |

9

|

5 | Best

|

0.12251 |

0.083261 |

0.12251 |

0.12253 |

0.0010264 |

0.04

|

6 | Accept |

0.12536 |

0.11182 |

0.12251 |

0.12242 |

0.0010276 |

0.0

|

7 | Accept |

0.13105 |

1.3745 |

0.12251 |

0.12363 |

0.04548 |

0.02

|

8 | Accept |

0.12821 |

0.092434 |

0.12251 |

0.12489 |

0.001004 |

0.03

|

9 | Accept |

0.1339 |

0.078771 |

0.12251 |

0.12489 |

0.0010115 |

0.07

|

10 | Accept |

0.12536 |

0.14329 |

0.12251 |

0.12493 |

0.0010018 |

0.01

|

11 | Accept |

0.12821 |

0.097635 |

0.12251 |

0.12596 |

0.001089 |

0.02

|

12 | Accept |

0.13675 |

0.27617 |

0.12251 |

0.12541 |

0.0010255 |

0.008

|

13 | Accept |

0.1339 |

0.088394 |

0.12251 |

0.12704 |

0.0010116 |

0.03

|

14 | Accept |

0.35897 |

0.078646 |

0.12251 |

0.12704 |

0.0013423 |

76

|

15 | Accept |

0.35897 |

0.075086 |

0.12251 |

0.127 |

0.13735 |

99

|

16 | Accept |

0.35897 |

0.075732 |

0.12251 |

0.12698 |

994.11 |

99

|

17 | Accept |

0.35897 |

0.093507 |

0.12251 |

0.12763 |

0.0010027 |

0.8

|

18 | Accept |

0.12536 |

0.083076 |

0.12251 |

0.12657 |

0.0010101 |

0.04

|

19 | Accept |

0.12821 |

0.083382 |

0.12251 |

0.12701 |

0.0010938 |

0.04

|

20 | Accept |

0.12536 |

0.11448 |

0.12251 |

0.12556 |

0.0010146 |

0.01

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelS

32-109

32

Functions

| | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.12251 | 0.11771 | 0.12251 | 0.12461 | 0.0010166 | 0.01

32-110

|

22 | Accept |

0.12251 |

0.12403 |

0.12251 |

0.12405 |

0.001065 |

0.01

|

23 | Accept |

0.35897 |

0.079806 |

0.12251 |

0.12416 |

0.03993 |

21

|

24 | Accept |

0.35897 |

0.075101 |

0.12251 |

0.12429 |

2.3338 |

33

|

25 | Accept |

0.12536 |

0.17006 |

0.12251 |

0.12423 |

992.83 |

1

|

26 | Accept |

0.13675 |

0.30995 |

0.12251 |

0.12421 |

987.13 |

5.

|

27 | Accept |

0.13105 |

0.19598 |

0.12251 |

0.12442 |

0.0029454 |

0.01

|

28 | Accept |

0.12251 |

0.1239 |

0.12251 |

0.12445 |

0.0062605 |

0.05

|

29 | Accept |

0.12821 |

0.14179 |

0.12251 |

0.12451 |

0.022617 |

0.06

|

30 | Accept |

0.13105 |

0.16587 |

0.12251 |

0.12458 |

995.38 |

10

BayesianOptimization

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 55.6497 seconds. Total objective function evaluation time: 28.6241 Best observed feasible point: BoxConstraint KernelScale _____________ ___________ 0.0010264

0.042908

Observed objective function value = 0.12251 Estimated objective function value = 0.12703

32-111

32

Functions

Function evaluation time = 0.083261 Best estimated feasible point (according to models): BoxConstraint KernelScale _____________ ___________ 0.0010146

0.017572

Estimated objective function value = 0.12458 Estimated function evaluation time = 0.11721 Mdl = ClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: HyperparameterOptimizationResults: Alpha: Bias: KernelParameters: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

'Y' [] {'b' 'g'} 'none' 351 [1×1 BayesianOptimization] [90×1 double] -5.7329 [1×1 struct] [351×1 double] [1×1 struct] [351×1 logical] 'SMO'

Properties, Methods

The fit achieved about 12% loss for the default 5-fold cross validation. Examine the BayesianOptimization object that is returned in the HyperparameterOptimizationResults property of the returned model. disp(Mdl.HyperparameterOptimizationResults) BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace:

32-112

@createObjFcn/inMemoryObjFcn [5×1 optimizableVariable] [1×1 struct] 0.1225 [1×2 table] 0.1246 [1×2 table] 30 55.6497 [1×2 table] [30×2 table] [30×1 double] [] {30×1 cell} [30×1 double] [30×1 double] [30×1 double] [30×1 logical]

BayesianOptimization

FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

[30×1 [30×1 [30×1 [30×1

double] double] double] double]

See Also bayesopt | fitcdiscr | fitcecoc | fitcensemble | fitcknn | fitclinear | fitcnb | fitcsvm | fitctree | fitrensemble | fitrgp | fitrlinear | fitrsvm | fitrtree Topics “Bayesian Optimization Workflow” on page 10-25 Introduced in R2016b

32-113

32

Functions

bayesopt Select optimal machine learning hyperparameters using Bayesian optimization

Syntax results = bayesopt(fun,vars) results = bayesopt(fun,vars,Name,Value)

Description results = bayesopt(fun,vars) attempts to find values of vars that minimize fun(vars). Note To include extra parameters in an objective function, see “Parameterizing Functions” (MATLAB). results = bayesopt(fun,vars,Name,Value) modifies the optimization process according to the Name,Value arguments.

Examples Create a BayesianOptimization Object Using bayesopt This example shows how to create a BayesianOptimization object by using bayesopt to minimize cross-validation loss. Optimize hyperparameters of a KNN classifier for the ionosphere data, that is, find KNN hyperparameters that minimize the cross-validation loss. Have bayesopt minimize over the following hyperparameters: • Nearest-neighborhood sizes from 1 to 30 • Distance functions 'chebychev', 'euclidean', and 'minkowski'. For reproducibility, set the random seed, set the partition, and set the AcquisitionFunctionName option to 'expected-improvement-plus'. To suppress iterative display, set 'Verbose' to 0. Pass the partition c and fitting data X and Y to the objective function fun by creating fun as an anonymous function that incorporates this data. See “Parameterizing Functions” (MATLAB). load ionosphere rng default num = optimizableVariable('n',[1,30],'Type','integer'); dst = optimizableVariable('dst',{'chebychev','euclidean','minkowski'},'Type','categorical'); c = cvpartition(351,'Kfold',5); fun = @(x)kfoldLoss(fitcknn(X,Y,'CVPartition',c,'NumNeighbors',x.n,... 'Distance',char(x.dst),'NSMethod','exhaustive')); results = bayesopt(fun,[num,dst],'Verbose',0,... 'AcquisitionFunctionName','expected-improvement-plus')

32-114

bayesopt

32-115

32

Functions

results = BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace: FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

32-116

[function_handle] [1x2 optimizableVariable] [1x1 struct] 0.1197 [1x2 table] 0.1213 [1x2 table] 30 130.1401 [1x2 table] [30x2 table] [30x1 double] [] {30x1 cell} [30x1 double] [30x1 double] [30x1 double] [30x1 logical] [30x1 double] [30x1 double] [30x1 double] [30x1 double]

bayesopt

Bayesian Optimization with Coupled Constraints A coupled constraint is one that can be evaluated only by evaluating the objective function. In this case, the objective function is the cross-validated loss of an SVM model. The coupled constraint is that the number of support vectors is no more than 100. The model details are in “Optimize a CrossValidated SVM Classifier Using bayesopt” on page 10-45. Create the data for classification. rng default grnpop = mvnrnd([1,0],eye(2),10); redpop = mvnrnd([0,1],eye(2),10); redpts = zeros(100,2); grnpts = redpts; for i = 1:100 grnpts(i,:) = mvnrnd(grnpop(randi(10),:),eye(2)*0.02); redpts(i,:) = mvnrnd(redpop(randi(10),:),eye(2)*0.02); end cdata = [grnpts;redpts]; grp = ones(200,1); grp(101:200) = -1; c = cvpartition(200,'KFold',10); sigma = optimizableVariable('sigma',[1e-5,1e5],'Transform','log'); box = optimizableVariable('box',[1e-5,1e5],'Transform','log');

The objective function is the cross-validation loss of the SVM model for partition c. The coupled constraint is the number of support vectors minus 100.5. This ensures that 100 support vectors give a negative constraint value, but 101 support vectors give a positive value. The model has 200 data points, so the coupled constraint values range from -99.5 (there is always at least one support vector) to 99.5. Positive values mean the constraint is not satisfied. function [objective,constraint] = mysvmfun(x,cdata,grp,c) SVMModel = fitcsvm(cdata,grp,'KernelFunction','rbf',... 'BoxConstraint',x.box,... 'KernelScale',x.sigma); cvModel = crossval(SVMModel,'CVPartition',c); objective = kfoldLoss(cvModel); constraint = sum(SVMModel.IsSupportVector)-100.5;

Pass the partition c and fitting data cdata and grp to the objective function fun by creating fun as an anonymous function that incorporates this data. See “Parameterizing Functions” (MATLAB). fun = @(x)mysvmfun(x,cdata,grp,c);

Set the NumCoupledConstraints to 1 so the optimizer knows that there is a coupled constraint. Set options to plot the constraint model. results = bayesopt(fun,[sigma,box],'IsObjectiveDeterministic',true,... 'NumCoupledConstraints',1,'PlotFcn',... {@plotMinObjective,@plotConstraintModels},... 'AcquisitionFunctionName','expected-improvement-plus','Verbose',0);

32-117

32

Functions

32-118

bayesopt

32-119

32

Functions

Most points lead to an infeasible number of support vectors.

Parallel Bayesian Optimization Improve the speed of a Bayesian optimization by using parallel objective function evaluation. Prepare variables and the objective function for Bayesian optimization. The objective function is the cross-validation error rate for the ionosphere data, a binary classification problem. Use fitcsvm as the classifier, with BoxConstraint and KernelScale as the parameters to optimize. load ionosphere box = optimizableVariable('box',[1e-4,1e3],'Transform','log'); kern = optimizableVariable('kern',[1e-4,1e3],'Transform','log'); vars = [box,kern]; fun = @(vars)kfoldLoss(fitcsvm(X,Y,'BoxConstraint',vars.box,'KernelScale',vars.kern,... 'Kfold',5));

32-120

bayesopt

Search for the parameters that give the lowest cross-validation error by using parallel Bayesian optimization. results = bayesopt(fun,vars,'UseParallel',true); Copying objective function to workers... Done copying objective function to workers.

|================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | box | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 1 | 2 | Accept | 0.2735 | 0.56171 | 0.13105 | 0.13108 | 0.0002608 | 2 | 2 | Accept | 0.35897 | 0.4062 | 0.13105 | 0.13108 | 3.6999 | 3 | 2 | Accept | 0.13675 | 0.42727 | 0.13105 | 0.13108 | 0.33594 | 4 | 2 | Accept | 0.35897 | 0.4453 | 0.13105 | 0.13108 | 0.014127 | 5 | 2 | Best | 0.13105 | 0.45503 | 0.13105 | 0.13108 | 0.29713 |

6 |

6 | Accept |

0.35897 |

0.16605 |

0.13105 |

0.13108 |

8.1878

| |

7 | 8 |

5 | Best | 5 | Accept |

0.11396 | 0.14245 |

0.51146 | 0.24943 |

0.11396 | 0.11396 |

0.11395 | 0.11395 |

8.7331 0.0020774

|

9 |

6 | Best

|

0.10826 |

4.0711 |

0.10826 |

0.10827 |

0.0015925

|

10 |

6 | Accept |

0.25641 |

16.265 |

0.10826 |

0.10829 |

0.00057357

|

11 |

6 | Accept |

0.1339 |

15.581 |

0.10826 |

0.10829 |

1.4553

|

12 |

6 | Accept |

0.16809 |

19.585 |

0.10826 |

0.10828 |

0.26919

|

13 |

6 | Accept |

0.20513 |

18.637 |

0.10826 |

0.10828 |

369.59

|

14 |

6 | Accept |

0.12536 |

0.11382 |

0.10826 |

0.10829 |

5.7059

|

15 |

6 | Accept |

0.13675 |

2.63 |

0.10826 |

0.10828 |

984.19

|

16 |

6 | Accept |

0.12821 |

2.0743 |

0.10826 |

0.11144 |

0.0063411

|

17 |

6 | Accept |

0.1339 |

0.1939 |

0.10826 |

0.11302 |

0.00010225

|

18 |

6 | Accept |

0.12821 |

0.20933 |

0.10826 |

0.11376 |

7.7447

| 19 | 4 | Accept | 0.55556 | 17.564 | 0.10826 | 0.10828 | 0.0087593 | 20 | 4 | Accept | 0.1396 | 16.473 | 0.10826 | 0.10828 | 0.054844 |================================================================================================ | Iter | Active | Eval | Objective | Objective | BestSoFar | BestSoFar | box | | workers | result | | runtime | (observed) | (estim.) | |================================================================================================ | 21 | 4 | Accept | 0.1339 | 0.17127 | 0.10826 | 0.10828 | 9.2668 |

22 |

4 | Accept |

0.12821 |

0.089065 |

0.10826 |

0.10828 |

12.265

|

23 |

4 | Accept |

0.12536 |

0.073586 |

0.10826 |

0.10828 |

1.3355

|

24 |

4 | Accept |

0.12821 |

0.08038 |

0.10826 |

0.10828 |

131.51

| |

25 | 26 |

3 | Accept | 3 | Accept |

0.11111 | 0.13675 |

10.687 | 0.18626 |

0.10826 | 0.10826 |

0.10867 | 0.10867 |

1.4795 2.0513

|

27 |

6 | Accept |

0.12821 |

0.078559 |

0.10826 |

0.10868 |

980.04

32-121

32

Functions

32-122

| |

28 | 29 |

5 | Accept | 5 | Accept |

0.33048 | 0.16239 |

0.089844 | 0.12688 |

0.10826 | 0.10826 |

0.10843 | 0.10843 |

0.41821 172.39

|

30 |

5 | Accept |

0.11966 |

0.14597 |

0.10826 |

0.10846 |

639.15

bayesopt

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 48.2085 seconds. Total objective function evaluation time: 128.3472 Best observed feasible point: box kern _________ _________ 0.0015925

0.0050225

Observed objective function value = 0.10826 Estimated objective function value = 0.10846 Function evaluation time = 4.0711 Best estimated feasible point (according to models): box kern _________ _________ 0.0015925

0.0050225

Estimated objective function value = 0.10846 Estimated function evaluation time = 2.8307

32-123

32

Functions

Return the best feasible point in the Bayesian model results by using the bestPoint function. Use the default criterion min-visited-upper-confidence-interval, which determines the best feasible point as the visited point that minimizes an upper confidence interval on the objective function value. zbest = bestPoint(results) zbest=1×2 table box _________ 0.0015925

kern _________ 0.0050225

The table zbest contains the optimal estimated values for the 'BoxConstraint' and 'KernelScale' name-value pair arguments. Use these values to train a new optimized classifier. Mdl = fitcsvm(X,Y,'BoxConstraint',zbest.box,'KernelScale',zbest.kern);

Observe that the optimal parameters are in Mdl. Mdl.BoxConstraints(1) ans = 0.0016 Mdl.KernelParameters.Scale ans = 0.0050

Input Arguments fun — Objective function function handle | parallel.pool.Constant whose Value is a function handle Objective function, specified as a function handle or, when the UseParallel name-value pair is true, a parallel.pool.Constant whose Value is a function handle. Typically, fun returns a measure of loss (such as a misclassification error) for a machine learning model that has tunable hyperparameters to control its training. fun has these signatures: objective = fun(x) % or [objective,constraints] = fun(x) % or [objective,constraints,UserData] = fun(x)

fun accepts x, a 1-by-D table of variable values, and returns objective, a real scalar representing the objective function value fun(x). Optionally, fun also returns: • constraints, a real vector of coupled constraint violations. For a definition, see “Coupled Constraints” on page 32-132. constraint(j) > 0 means constraint j is violated. constraint(j) < 0 means constraint j is satisfied. • UserData, an entity of any type (such as a scalar, matrix, structure, or object). For an example of a custom plot function that uses UserData, see “Create a Custom Plot Function” on page 10-12. 32-124

bayesopt

For details about using parallel.pool.Constant with bayesopt, see “Placing the Objective Function on Workers” on page 10-8. Example: @objfun Data Types: function_handle vars — Variable descriptions vector of optimizableVariable objects defining the hyperparameters to be tuned Variable descriptions, specified as a vector of optimizableVariable objects defining the hyperparameters to be tuned. Example: [X1,X2], where X1 and X2 are optimizableVariable objects Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: results = bayesopt(fun,vars,'AcquisitionFunctionName','expectedimprovement-plus') Algorithm Control

AcquisitionFunctionName — Function to choose next evaluation point 'expected-improvement-per-second-plus' (default) | 'expected-improvement' | 'expected-improvement-plus' | 'expected-improvement-per-second' | 'lowerconfidence-bound' | 'probability-of-improvement' Function to choose next evaluation point, specified as one of the listed choices. Acquisition functions whose names include per-second do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. Example: 'AcquisitionFunctionName','expected-improvement-per-second' IsObjectiveDeterministic — Specify deterministic objective function false (default) | true Specify deterministic objective function, specified as false or true. If fun is stochastic (that is, fun(x) can return different values for the same x), then set IsObjectiveDeterministic to false. In this case, bayesopt estimates a noise level during optimization. Example: 'IsObjectiveDeterministic',true Data Types: logical ExplorationRatio — Propensity to explore 0.5 (default) | positive real Propensity to explore, specified as a positive real. Applies to the 'expected-improvement-plus' and 'expected-improvement-per-second-plus' acquisition functions. See “Plus” on page 10-5. Example: 'ExplorationRatio',0.2 32-125

32

Functions

Data Types: double GPActiveSetSize — Fit Gaussian Process model to GPActiveSetSize or fewer points 300 (default) | positive integer Fit Gaussian Process model to GPActiveSetSize or fewer points, specified as a positive integer. When bayesopt has visited more than GPActiveSetSize points, subsequent iterations that use a GP model fit the model to GPActiveSetSize points. bayesopt chooses points uniformly at random without replacement among visited points. Using fewer points leads to faster GP model fitting, at the expense of possibly less accurate fitting. Example: 'GPActiveSetSize',80 Data Types: double UseParallel — Compute in parallel false (default) | true Compute in parallel, specified as false (do not compute in parallel) or true (compute in parallel). bayesopt performs parallel objective function evaluations concurrently on parallel workers. For algorithmic details, see “Parallel Bayesian Optimization” on page 10-7. Example: 'UseParallel',true Data Types: logical ParallelMethod — Imputation method for parallel worker objective function values 'clipped-model-prediction' (default) | 'model-prediction' | 'max-observed' | 'minobserved' Imputation method for parallel worker objective function values, specified as 'clipped-modelprediction', 'model-prediction', 'max-observed', or 'min-observed'. To generate a new point to evaluate, bayesopt fits a Gaussian process to all points, including the points being evaluated on workers. To fit the process, bayesopt imputes objective function values for the points that are currently on workers. ParallelMethod specifies the method used for imputation. • 'clipped-model-prediction' — Impute the maximum of these quantities: • Mean Gaussian process prediction at the point x • Minimum observed objective function among feasible points visited • Minimum model prediction among all feasible points • 'model-prediction' — Impute the mean Gaussian process prediction at the point x. • 'max-observed' — Impute the maximum observed objective function value among feasible points. • 'min-observed' — Impute the minimum observed objective function value among feasible points. Example: 'ParallelMethod','max-observed' MinWorkerUtilization — Tolerance on number of active parallel workers floor(0.8*Nworkers) (default) | positive integer Tolerance on the number of active parallel workers, specified as a positive integer. After bayesopt assigns a point to evaluate, and before it computes a new point to assign, it checks whether fewer than MinWorkerUtilization workers are active. If so, bayesopt assigns random points within 32-126

bayesopt

bounds to all available workers. Otherwise, bayesopt calculates the best point for one worker. bayesopt creates random points much faster than fitted points, so this behavior leads to higher utilization of workers, at the cost of possibly poorer points. For details, see “Parallel Bayesian Optimization” on page 10-7. Example: 'MinWorkerUtilization',3 Data Types: double Starting and Stopping

MaxObjectiveEvaluations — Objective function evaluation limit 30 (default) | positive integer Objective function evaluation limit, specified as a positive integer. Example: 'MaxObjectiveEvaluations',60 Data Types: double MaxTime — Time limit Inf (default) | positive real Time limit, specified as a positive real. The time limit is in seconds, as measured by tic and toc. Run time can exceed MaxTime because bayesopt does not interrupt function evaluations. Example: 'MaxTime',3600 Data Types: double NumSeedPoints — Number of initial evaluation points 4 (default) | positive integer Number of initial evaluation points, specified as a positive integer. bayesopt chooses these points randomly within the variable bounds, according to the setting of the Transform setting for each variable (uniform for 'none', logarithmically spaced for 'log'). Example: 'NumSeedPoints',10 Data Types: double Constraints

XConstraintFcn — Deterministic constraints on variables [] (default) | function handle Deterministic constraints on variables, specified as a function handle. For details, see “Deterministic Constraints — XConstraintFcn” on page 10-38. Example: 'XConstraintFcn',@xconstraint Data Types: function_handle ConditionalVariableFcn — Conditional variable constraints [] (default) | function handle Conditional variable constraints, specified as a function handle. For details, see “Conditional Constraints — ConditionalVariableFcn” on page 10-39. 32-127

32

Functions

Example: 'ConditionalVariableFcn',@condfun Data Types: function_handle NumCoupledConstraints — Number of coupled constraints 0 (default) | positive integer Number of coupled constraints, specified as a positive integer. For details, see “Coupled Constraints” on page 10-40. Note NumCoupledConstraints is required when you have coupled constraints. Example: 'NumCoupledConstraints',3 Data Types: double AreCoupledConstraintsDeterministic — Indication of whether coupled constraints are deterministic true for all coupled constraints (default) | logical vector Indication of whether coupled constraints are deterministic, specified as a logical vector of length NumCoupledConstraints. For details, see “Coupled Constraints” on page 10-40. Example: 'AreCoupledConstraintsDeterministic',[true,false,true] Data Types: logical Reports, Plots, and Halting

Verbose — Command-line display level 1 (default) | 0 | 2 Command-line display level, specified as 0, 1, or 2. • 0 — No command-line display. • 1 — At each iteration, display the iteration number, result report (see the next paragraph), objective function model, objective function evaluation time, best (lowest) observed objective function value, best (lowest) estimated objective function value, and the observed constraint values (if any). When optimizing in parallel, the display also includes a column showing the number of active workers, counted after assigning a job to the next worker. The result report for each iteration is one of the following: • Accept — The objective function returns a finite value, and all constraints are satisfied. • Best — Constraints are satisfied, and the objective function returns the lowest value among feasible points. • Error — The objective function returns a value that is not a finite real scalar. • Infeas — At least one constraint is violated. • 2 — Same as 1, adding diagnostic information such as time to select the next point, model fitting time, indication that "plus" acquisition functions declare overexploiting, and parallel workers are being assigned to random points due to low parallel utilization. Example: 'Verbose',2 32-128

bayesopt

Data Types: double OutputFcn — Function called after each iteration {} (default) | function handle | cell array of function handles Function called after each iteration, specified as a function handle or cell array of function handles. An output function can halt the solver, and can perform arbitrary calculations, including creating variables or plotting. Specify several output functions using a cell array of function handles. There are two built-in output functions: • @assignInBase — Constructs a BayesianOptimization instance at each iteration and assigns it to a variable in the base workspace. Choose a variable name using the SaveVariableName name-value pair. • @saveToFile — Constructs a BayesianOptimization instance at each iteration and saves it to a file in the current folder. Choose a file name using the SaveFileName name-value pair. You can write your own output functions. For details, see “Bayesian Optimization Output Functions” on page 10-19. Example: 'OutputFcn',{@saveToFile @myOutputFunction} Data Types: cell | function_handle SaveFileName — File name for the @saveToFile output function 'BayesoptResults.mat' (default) | character vector | string scalar File name for the @saveToFile output function, specified as a character vector or string scalar. The file name can include a path, such as '../optimizations/September2.mat'. Example: 'SaveFileName','September2.mat' Data Types: char | string SaveVariableName — Variable name for the @assignInBase output function 'BayesoptResults' (default) | character vector | string scalar Variable name for the @assignInBase output function, specified as a character vector or string scalar. Example: 'SaveVariableName','September2Results' Data Types: char | string PlotFcn — Plot function called after each iteration {@plotObjectiveModel,@plotMinObjective} (default) | 'all' | function handle | cell array of function handles Plot function called after each iteration, specified as 'all', a function handle, or a cell array of function handles. A plot function can halt the solver, and can perform arbitrary calculations, including creating variables, in addition to plotting. Specify no plot function as []. 'all' calls all built-in plot functions. Specify several plot functions using a cell array of function handles. The built-in plot functions appear in the following tables. 32-129

32

Functions

Model Plots — Apply When D ≤ 2 Description @plotAcquisitionFunction

Plot the acquisition function surface.

@plotConstraintModels

Plot each constraint model surface. Negative values indicate feasible points. Also plot a P(feasible) surface. Also plot the error model, if it exists, which ranges from –1 to 1. Negative values mean that the model probably does not error, positive values mean that it probably does error. The model is: Plotted error = 2*Probability(error) – 1.

@plotObjectiveEvaluationTime Plot the objective function evaluation time model surface. Model @plotObjectiveModel

Plot the fun model surface, the estimated location of the minimum, and the location of the next proposed point to evaluate. For one-dimensional problems, plot envelopes one credible interval above and below the mean function, and envelopes one noise standard deviation above and below the mean.

Trace Plots — Apply to All D

Description

@plotObjective

Plot each observed function value versus the number of function evaluations.

@plotObjectiveEvaluationTime Plot each observed function evaluation run time versus the number of function evaluations. @plotMinObjective

Plot the minimum observed and estimated function values versus the number of function evaluations.

@plotElapsedTime

Plot three curves: the total elapsed time of the optimization, the total function evaluation time, and the total modeling and point selection time, all versus the number of function evaluations.

You can write your own plot functions. For details, see “Bayesian Optimization Plot Functions” on page 10-11. Note When there are coupled constraints, iterative display and plot functions can give counterintuitive results such as: • A minimum objective plot can increase. • The optimization can declare a problem infeasible even when it showed an earlier feasible point. The reason for this behavior is that the decision about whether a point is feasible can change as the optimization progresses. bayesopt determines feasibility with respect to its constraint model, and this model changes as bayesopt evaluates points. So a “minimum objective” plot can increase when the minimal point is later deemed infeasible, and the iterative display can show a feasible point that is later deemed infeasible. Example: 'PlotFcn','all' 32-130

bayesopt

Data Types: char | string | cell | function_handle Initialization

InitialX — Initial evaluation points NumSeedPoints-by-D random initial points within bounds (default) | N-by-D table Initial evaluation points, specified as an N-by-D table, where N is the number of evaluation points, and D is the number of variables. Note If only InitialX is provided, it is interpreted as initial points to evaluate. The objective function is evaluated at InitialX. If any other initialization parameters are also provided, InitialX is interpreted as prior function evaluation data. The objective function is not evaluated. Any missing values are set to NaN. Data Types: table InitialObjective — Objective values corresponding to InitialX [] (default) | length-N vector Objective values corresponding to InitialX, specified as a length-N vector, where N is the number of evaluation points. Example: 'InitialObjective',[17;-3;-12.5] Data Types: double InitialConstraintViolations — Constraint violations of coupled constraints [] (default) | N-by-K matrix Constraint violations of coupled constraints, specified as an N-by-K matrix, where N is the number of evaluation points and K is the number of coupled constraints. For details, see “Coupled Constraints” on page 10-40. Data Types: double InitialErrorValues — Errors for InitialX [] (default) | length-N vector with entries -1 or 1 Errors for InitialX, specified as a length-N vector with entries -1 or 1, where N is the number of evaluation points. Specify -1 for no error, and 1 for an error. Example: 'InitialErrorValues',[-1,-1,-1,-1,1] Data Types: double InitialUserData — Initial data corresponding to InitialX [] (default) | length-N cell vector Initial data corresponding to InitialX, specified as a length-N cell vector, where N is the number of evaluation points. Example: 'InitialUserData',{2,3,-1} Data Types: cell 32-131

32

Functions

InitialObjectiveEvaluationTimes — Evaluation times of objective function at InitialX [] (default) | length-N vector Evaluation times of objective function at InitialX, specified as a length-N vector, where N is the number of evaluation points. Time is measured in seconds. Data Types: double InitialIterationTimes — Times for the first N iterations {} (default) | length-N vector Times for the first N iterations, specified as a length-N vector, where N is the number of evaluation points. Time is measured in seconds. Data Types: double

Output Arguments results — Bayesian optimization results BayesianOptimization object Bayesian optimization results, returned as a BayesianOptimization object.

More About Coupled Constraints Coupled constraints are those constraints whose value comes from the objective function calculation. See “Coupled Constraints” on page 10-40.

Tips • Bayesian optimization is not reproducible if one of these conditions exists: • You specify an acquisition function whose name includes per-second, such as 'expectedimprovement-per-second'. The per-second modifier indicates that optimization depends on the run time of the objective function. For more details, see “Acquisition Function Types” on page 10-3. • You specify to run Bayesian optimization in parallel. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For more details, see “Parallel Bayesian Optimization” on page 10-7.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel',true name-value pair argument in the call to this function. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox). 32-132

bayesopt

See Also BayesianOptimization | bestPoint | optimizableVariable Topics “Optimize a Cross-Validated SVM Classifier Using bayesopt” on page 10-45 “Bayesian Optimization Algorithm” on page 10-2 Introduced in R2016b

32-133

32

Functions

bbdesign Box-Behnken design

Syntax dBB = bbdesign(n) [dBB,blocks] = bbdesign(n) [...] = bbdesign(n,param,val)

Description dBB = bbdesign(n) generates a Box-Behnken design for n factors. n must be an integer 3 or larger. The output matrix dBB is m-by-n, where m is the number of runs in the design. Each row represents one run, with settings for all factors represented in the columns. Factor values are normalized so that the cube points take values between -1 and 1. [dBB,blocks] = bbdesign(n) requests a blocked design. The output blocks is an m-by-1 vector of block numbers for each run. Blocks indicate runs that are to be measured under similar conditions to minimize the effect of inter-block differences on the parameter estimates. [...] = bbdesign(n,param,val) specifies one or more optional parameter/value pairs for the design. The following table lists valid parameter/value pairs. Parameter

Description

Values

'center'

Number of center points. Integer. The default depends on n.

'blocksize'

Maximum number of points per block.

Integer. The default is Inf.

Examples The following creates a 3-factor Box-Behnken design: dBB = bbdesign(3) dBB = -1 -1 0 -1 1 0 1 -1 0 1 1 0 -1 0 -1 -1 0 1 1 0 -1 1 0 1 0 -1 -1 0 -1 1 0 1 -1 0 1 1 0 0 0 0 0 0 0 0 0

32-134

bbdesign

The center point is run 3 times to allow for a more uniform estimate of the prediction variance over the entire design space. Visualize the design as follows: plot3(dBB(:,1),dBB(:,2),dBB(:,3),'ro',... 'MarkerFaceColor','b') X = [1 -1 -1 -1 1 -1 -1 -1 1 1 -1 -1; ... 1 1 1 -1 1 1 1 -1 1 1 -1 -1]; Y = [-1 -1 1 -1 -1 -1 1 -1 1 -1 1 -1; ... 1 -1 1 1 1 -1 1 1 1 -1 1 -1]; Z = [1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1; ... 1 1 1 1 -1 -1 -1 -1 1 1 1 1]; line(X,Y,Z,'Color','b') axis square equal

See Also ccdesign Introduced before R2006a

32-135

32

Functions

bestPoint Best point in a Bayesian optimization according to a criterion

Syntax x = bestPoint(results) x = bestPoint(results,Name,Value) [x,CriterionValue] = bestPoint( ___ ) [x,CriterionValue,iteration] = bestPoint( ___ )

Description x = bestPoint(results) returns the best feasible point in the Bayesian model results according to the default criterion 'min-visited-upper-confidence-interval'. x = bestPoint(results,Name,Value) modifies the best point using name-value pairs. [x,CriterionValue] = bestPoint( ___ ), for any previous syntax, also returns the value of the criterion at x. [x,CriterionValue,iteration] = bestPoint( ___ ) also returns the iteration number at which the best point was returned. Applies when the Criterion name-value pair is 'minobserved', 'min-visited-mean', or the default 'min-visited-upper-confidenceinterval'.

Examples Best Point of an Optimized KNN Classifier This example shows how to obtain the best point of an optimized classifier. Optimize a KNN classifier for the ionosphere data, meaning find parameters that minimize the cross-validation loss. Minimize over nearest-neighborhood sizes from 1 to 30, and over the distance functions 'chebychev', 'euclidean', and 'minkowski'. For reproducibility, set the random seed, and set the AcquisitionFunctionName option to 'expected-improvement-plus'. load ionosphere rng(11) num = optimizableVariable('n',[1,30],'Type','integer'); dst = optimizableVariable('dst',{'chebychev','euclidean','minkowski'},'Type','categorical'); c = cvpartition(351,'Kfold',5); fun = @(x)kfoldLoss(fitcknn(X,Y,'CVPartition',c,'NumNeighbors',x.n,... 'Distance',char(x.dst),'NSMethod','exhaustive')); results = bayesopt(fun,[num,dst],'Verbose',0,... 'AcquisitionFunctionName','expected-improvement-plus');

32-136

bestPoint

32-137

32

Functions

Obtain the best point according to the default 'min-visited-upper-confidence-interval' criterion. x = bestPoint(results) x=1×2 table n dst _ _________ 1

chebychev

The lowest estimated cross-validation loss occurs for one nearest neighbor and 'chebychev' distance. Careful examination of the objective function model plot shows a point with two nearest neighbors and 'chebychev' distance that has a lower objective function value. Find this point using a different criterion. x = bestPoint(results,'Criterion','min-observed') x=1×2 table n dst _ _________ 2

32-138

chebychev

bestPoint

Also find the minimum observed objective function value, and the iteration number at which it was observed. [x,CriterionValue,iteration] = bestPoint(results,'Criterion','min-observed') x=1×2 table n dst _ _________ 2

chebychev

CriterionValue = 0.1054 iteration = 21

Input Arguments results — Bayesian optimization results BayesianOptimization object Bayesian optimization results, specified as a BayesianOptimization object. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: x = bestPoint(results,'Criterion','min-observed') Criterion — Best point criterion 'min-visited-upper-confidence-interval' (default) | 'min-observed' | 'min-mean' | 'min-upper-confidence-interval' | 'min-visited-mean' Best point criterion, specified as the comma-separated pair consisting of 'Criterion' and a criterion name. The names are case-insensitive, do not require - characters, and require only enough characters to make the name uniquely distinguishable. Criterion Name

Meaning

'min-observed'

x is the feasible point with minimum observed objective.

'min-mean'

x is the feasible point where the objective model mean is minimized.

'min-upper-confidenceinterval'

x is the feasible point minimizing an upper confidence interval of the objective model. See alpha.

'min-visited-mean'

x is the feasible point where the objective model mean is minimized among the visited points.

'min-visited-upperconfidence-interval'

x is the feasible point minimizing an upper confidence interval of the objective model among the visited points. See alpha.

Example: 'Criterion','min-visited-mean' 32-139

32

Functions

alpha — Probability that modeled objective mean exceeds CriterionValue 0.01 (default) | scalar between 0 and 1 Probability that the modeled objective mean exceeds CriterionValue, specified as the commaseparated pair consisting of 'alpha' and a scalar between 0 and 1. alpha relates to the 'minupper-confidence-interval' and 'min-visited-upper-confidence-interval' Criterion values. The definition for the upper confidence interval is the value Y where P(meanQ(fun(x))

>

Y)

=

alpha,

where fun is the objective function, and the mean is calculated with respect to the posterior distribution Q. Example: 'alpha',0.05 Data Types: double

Output Arguments x — Best point 1-by-D table Best point, returned as a 1-by-D table, where D is the number of variables. The meaning of “best” is with respect to Criterion. CriterionValue — Value of criterion real scalar Value of criterion, returned as a real scalar. The value depends on the setting of the Criterion name-value pair, which has a default value of 'min-visited-upper-confidence-interval'. Criterion Name

Meaning

'min-observed'

Minimum observed objective.

'min-mean'

Minimum of model mean.

'min-upper-confidenceinterval'

Value Y satisfying the equation P(meanQ(fun(x)) > Y) = alpha.

'min-visited-mean'

Minimum of observed model mean.

'min-visited-upperconfidence-interval'

Value Y satisfying the equation P(meanQ(fun(x)) > Y) = alpha among observed points.

iteration — Iteration number at which best point was observed positive integer Iteration number at which best point was observed, returned as a positive integer. The best point is defined by CriterionValue.

See Also BayesianOptimization | bayesopt Introduced in R2016b 32-140

betacdf

betacdf Beta cumulative distribution function

Syntax p = betacdf(x,a,b) p = betacdf(x,a,b,'upper')

Description p = betacdf(x,a,b) returns the beta cdf at each of the values in x using the corresponding parameters in a and b. x, a, and b can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other inputs. The parameters in a and b must all be positive, and the values in x must lie on the interval [0,1]. p = betacdf(x,a,b,'upper') returns the complement of the beta cdf at each of the values in x, using an algorithm that more accurately computes the extreme upper tail probabilities. The beta cdf for a given value x and given pair of parameters a and b is 1 p = F x a, b = B(a, b)

x

∫t

a−1

b−1

(1 − t)

dt

0

where B( · ) is the Beta function.

Examples Compute Beta Distribution CDF Compute the cdf for a beta distribution with parameters a = 2 and b = 2. x a b p

= = = =

0.1:0.2:0.9; 2; 2; betacdf(x,a,b)

p = 1×5 0.0280

0.2160

0.5000

0.7840

0.9720

a = [1 2 3]; p = betacdf(0.5,a,a) p = 1×3 0.5000

0.5000

0.5000

32-141

32

Functions

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also betafit | betainv | betalike | betapdf | betarnd | betastat | cdf Topics “Beta Distribution” on page B-6 Introduced before R2006a

32-142

betafit

betafit Beta parameter estimates

Syntax phat = betafit(data) [phat,pci] = betafit(data,alpha)

Description phat = betafit(data) computes the maximum likelihood estimates of the beta distribution parameters a and b from the data in the vector data and returns a column vector containing the a and b estimates, where the beta cdf is given by 1 F(x a, b) = B(a, b)

x

∫t

a−1

b−1

(1 − t)

dt

0

and B( · ) is the Beta function. The elements of data must lie in the open interval (0, 1), where the beta distribution is defined. However, it is sometimes also necessary to fit a beta distribution to data that include exact zeros or ones. For such data, the beta likelihood function is unbounded, and standard maximum likelihood estimation is not possible. In that case, betafit maximizes a modified likelihood that incorporates the zeros or ones by treating them as if they were values that have been left-censored at sqrt(realmin) or right-censored at 1-eps/2, respectively. [phat,pci] = betafit(data,alpha) returns confidence intervals on the a and b parameters in the 2-by-2 matrix pci. The first column of the matrix contains the lower and upper confidence bounds for parameter a, and the second column contains the confidence bounds for parameter b. The optional input argument alpha is a value in the range [0, 1] specifying the width of the confidence intervals. By default, alpha is 0.05, which corresponds to 95% confidence intervals. The confidence intervals are based on a normal approximation for the distribution of the logs of the parameter estimates.

Examples This example generates 100 beta distributed observations. The true a and b parameters are 4 and 3, respectively. Compare these to the values returned in p by the beta fit. Note that the columns of ci both bracket the true parameters. data = betarnd(4,3,100,1); [p,ci] = betafit(data,0.01) p = 5.5328 3.8097 ci = 3.6538 2.6197 8.3781 5.5402

32-143

32

Functions

References [1] Hahn, Gerald J., and S. S. Shapiro. Statistical Models in Engineering. Hoboken, NJ: John Wiley & Sons, Inc., 1994, p. 95.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also betacdf | betainv | betalike | betapdf | betarnd | betastat | mle Topics “Beta Distribution” on page B-6 Introduced before R2006a

32-144

betainv

betainv Beta inverse cumulative distribution function

Syntax X = betainv(P,A,B)

Description X = betainv(P,A,B) computes the inverse of the beta cdf with parameters specified by A and B for the corresponding probabilities in P. P, A, and B can be vectors, matrices, or multidimensional arrays that are all the same size. A scalar input is expanded to a constant array with the same dimensions as the other inputs. The parameters in A and B must all be positive, and the values in P must lie on the interval [0, 1]. The inverse beta cdf for a given probability p and a given pair of parameters a and b is x = F−1(p a, b) = x: F(x a, b) = p where p = F(x a, b) =

1 B(a, b)

x

∫t

a−1

b−1

(1 − t)

dt

0

and B( · ) is the Beta function. Each element of output X is the value whose cumulative probability under the beta cdf defined by the corresponding parameters in A and B is specified by the corresponding value in P.

Examples p = [0.01 0.5 0.99]; x = betainv(p,10,5) x = 0.3726 0.6742 0.8981

According to this result, for a beta cdf with a = 10 and b = 5, a value less than or equal to 0.3726 occurs with probability 0.01. Similarly, values less than or equal to 0.6742 and 0.8981 occur with respective probabilities 0.5 and 0.99.

Algorithms The betainv function uses Newton's method with modifications to constrain steps to the allowable range for x, i.e., [0 1].

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. 32-145

32

Functions

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also betacdf | betafit | betainv | betalike | betapdf | betarnd | betastat | icdf Topics “Beta Distribution” on page B-6 Introduced before R2006a

32-146

betalike

betalike Beta negative log-likelihood

Syntax nlogL = betalike(params,data) [nlogL,AVAR] = betalike(params,data)

Description nlogL = betalike(params,data) returns the negative of the beta log-likelihood function for the beta parameters a and b specified in vector params and the observations specified in the column vector data. The elements of data must lie in the open interval (0, 1), where the beta distribution is defined. However, it is sometimes also necessary to fit a beta distribution to data that include exact zeros or ones. For such data, the beta likelihood function is unbounded, and standard maximum likelihood estimation is not possible. In that case, betalike computes a modified likelihood that incorporates the zeros or ones by treating them as if they were values that have been left-censored at sqrt(realmin) or right-censored at 1-eps/2, respectively. [nlogL,AVAR] = betalike(params,data) also returns AVAR, which is the asymptotic variancecovariance matrix of the parameter estimates if the values in params are the maximum likelihood estimates. AVAR is the inverse of Fisher's information matrix. The diagonal elements of AVAR are the asymptotic variances of their respective parameters. betalike is a utility function for maximum likelihood estimation of the beta distribution. The likelihood assumes that all the elements in the data sample are mutually independent. Since betalike returns the negative beta log-likelihood function, minimizing betalike using fminsearch is the same as maximizing the likelihood.

Examples This example continues the betafit example, which calculates estimates of the beta parameters for some randomly generated beta distributed data. r = betarnd(4,3,100,1); [nlogl,AVAR] = betalike(betafit(r),r) nlogl = -27.5996 AVAR = 0.2783 0.1316

0.1316 0.0867

32-147

32

Functions

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also betacdf | betafit | betainv | betapdf | betarnd | betastat Topics “Beta Distribution” on page B-6 Introduced before R2006a

32-148

betapdf

betapdf Beta probability density function

Syntax Y = betapdf(X,A,B)

Description Y = betapdf(X,A,B) computes the beta pdf at each of the values in X using the corresponding parameters in A and B. X, A, and B can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions of the other inputs. The parameters in A and B must all be positive, and the values in X must lie on the interval [0, 1]. The beta probability density function for a given value x and given pair of parameters a and b is y = f (x a, b) =

1 b−1 xa − 1(1 − x) I 0, 1 (x) B(a, b)

where B( · ) is the Beta function. The uniform distribution on (0 1) is a degenerate case of the beta pdf where a = 1 and b = 1. A likelihood function is the pdf viewed as a function of the parameters. Maximum likelihood estimators (MLEs) are the values of the parameters that maximize the likelihood function for a fixed value of x.

Examples a = [0.5 1; 2 4] a = 0.5000 1.0000 2.0000 4.0000 y = betapdf(0.5,a,a) y = 0.6366 1.0000 1.5000 2.1875

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 32-149

32

Functions

See Also betacdf | betafit | betainv | betalike | betarnd | betastat | pdf Topics “Beta Distribution” on page B-6 Introduced before R2006a

32-150

betarnd

betarnd Beta random numbers

Syntax R = betarnd(A,B) R = betarnd(A,B,m,n,...) R = betarnd(A,B,[m,n,...])

Description R = betarnd(A,B) generates random numbers from the beta distribution with parameters specified by A and B. A and B can be vectors, matrices, or multidimensional arrays that have the same size, which is also the size of R. A scalar input for A or B is expanded to a constant array with the same dimensions as the other input. R = betarnd(A,B,m,n,...) or R = betarnd(A,B,[m,n,...]) generates an m-by-n-by-... array containing random numbers from the beta distribution with parameters A and B. A and B can each be scalars or arrays of the same size as R.

Examples a = [1 1;2 2]; b = [1 2;1 2]; r = betarnd(a,b) r = 0.6987 0.6139 0.9102 0.8067 r = betarnd(10,10,[1 5]) r = 0.5974 0.4777 0.5538

0.5465

0.6327

r = betarnd(4,2,2,3) r = 0.3943 0.6101 0.5768 0.5990 0.2760 0.5474

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The generated code can return a different sequence of numbers than MATLAB if either of the following is true: • The output is nonscalar. 32-151

32

Functions

• An input parameter is invalid for the distribution. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also betacdf | betafit | betainv | betalike | betapdf | betastat | random Topics “Beta Distribution” on page B-6 Introduced before R2006a

32-152

betastat

betastat Beta mean and variance

Syntax [M,V] = betastat(A,B)

Description [M,V] = betastat(A,B), with A>0 and B>0, returns the mean of and variance for the beta distribution with parameters specified by A and B. A and B can be vectors, matrices, or multidimensional arrays that have the same size, which is also the size of M and V. A scalar input for A or B is expanded to a constant array with the same dimensions as the other input. The mean of the beta distribution with parameters a and b is a/(a + b) and the variance is ab 2

(a + b + 1)(a + b)

Examples If parameters a and b are equal, the mean is 1/2. a = 1:6; [m,v] = betastat(a,a) m = 0.5000 0.5000 0.5000 v = 0.0833 0.0500 0.0357

0.5000

0.5000

0.5000

0.0278

0.0227

0.0192

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also betacdf | betafit | betainv | betalike | betapdf | betarnd Topics “Beta Distribution” on page B-6 Introduced before R2006a 32-153

32

Functions

binocdf Binomial cumulative distribution function

Syntax y = binocdf(x,n,p) y = binocdf(x,n,p,'upper')

Description y = binocdf(x,n,p) computes a binomial cumulative distribution function at each of the values in x using the corresponding number of trials in n and the probability of success for each trial in p. x, n, and p can be vectors, matrices, or multidimensional arrays of the same size. Alternatively, one or more arguments can be scalars. The binocdf function expands scalar inputs to constant arrays with the same dimensions as the other inputs. y = binocdf(x,n,p,'upper') returns the complement of the binomial cumulative distribution function at each value in x, using an algorithm that computes the extreme upper tail probabilities more accurately than the default algorithm.

Examples Compute and Plot Binomial Cumulative Distribution Function Compute and plot the binomial cumulative distribution function for the specified range of integer values, number of trials, and probability of success for each trial. A baseball team plays 100 games in a season and has a 50-50 chance of winning each game. Find the probability of the team winning more than 55 games in a season. format long 1 - binocdf(55,100,0.5) ans = 0.135626512036917

Find the probability of the team winning between 50 and 55 games in a season. binocdf(55,100,0.5) - binocdf(49,100,0.5) ans = 0.404168106656672

Compute the probabilities of the team winning more than 55 games in a season if the chance of winning each game ranges from 10% to 90%. chance = 0.1:0.05:0.9; y = 1 - binocdf(55,100,chance);

32-154

binocdf

Plot the results. scatter(chance,y) grid on

Compute Extreme Upper Tail Probabilities Compute the complement of the binomial cumulative distribution function with more accurate upper tail probabilities. A baseball team plays 100 games in a season and has a 50-50 chance of winning each game. Find the probability of the team winning more than 95 games in a season. format long 1 - binocdf(95,100,0.5) ans = 0

This result shows that the probability is so close to 1 (within eps) that subtracting it from 1 gives 0. To approximate the extreme upper tail probabilities better, compute the complement of the binomial cumulative distribution function directly instead of computing the difference. binocdf(95,100,0.5,'upper')

32-155

32

Functions

ans = 3.224844447881779e-24

Alternatively, use the binopdf function to find the probabilities of the team winning 96, 97, 98, 99, and 100 games in a season. Find the sum of these probabilities by using the sum function. sum(binopdf(96:100,100,0.5),'all') ans = 3.224844447881779e-24

Input Arguments x — Values at which to evaluate binomial cdf integer from interval [0 n] | array of integers from interval [0 n] Values at which to evaluate the binomial cdf, specified as an integer or an array of integers. All values of x must belong to the interval [0 n], where n is the number of trials. Example: [0 1 3 4] Data Types: single | double n — Number of trials positive integer | array of positive integers Number of trials, specified as a positive integer or an array of positive integers. Example: [10 20 50 100] Data Types: single | double p — Probability of success for each trial scalar value from interval [0 1] | array of scalar values from interval [0 1] Probability of success for each trial, specified as a scalar value or an array of scalar values. All values of p must belong to the interval [0 1]. Example: [0.01 0.1 0.5 0.7] Data Types: single | double

Output Arguments y — Binomial cdf values scalar value | array of scalar values Binomial cdf values, returned as a scalar value or an array of scalar values. Each element in y is the binomial cdf value of the distribution evaluated at the corresponding element in x. Data Types: single | double

32-156

binocdf

More About Binomial Cumulative Distribution Function The binomial cumulative distribution function lets you obtain the probability of observing less than or equal to x successes in n trials, with the probability p of success on a single trial. The binomial cumulative distribution function for a given value x and a given pair of parameters n and p is y = F(x n, p) =

x



i=0

n i (n − i) p (1 − p) I(0, 1, ..., n)(i) . i

The resulting value y is the probability of observing up to x successes in n independent trials, where the probability of success in any given trial is p. The indicator function I(0, 1, ..., n)(i) ensures that x only adopts values of 0,1,...,n.

Alternative Functionality • binocdf is a function specific to binomial distribution. Statistics and Machine Learning Toolbox also offers the generic function cdf, which supports various probability distributions. To use cdf, specify the probability distribution name and its parameters. Alternatively, create a BinomialDistribution probability distribution object and pass the object as an input argument. Note that the distribution-specific function binocdf is faster than the generic function cdf. • Use the Probability Distribution Function app to create an interactive plot of the cumulative distribution function (cdf) or probability density function (pdf) for a probability distribution.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also BinomialDistribution | binofit | binoinv | binopdf | binornd | binostat | cdf Topics “Binomial Distribution” on page B-10 Introduced before R2006a

32-157

32

Functions

binofit Binomial parameter estimates

Syntax phat = binofit(x,n) [phat,pci] = binofit(x,n) [phat,pci] = binofit(x,n,alpha)

Description phat = binofit(x,n) returns a maximum likelihood estimate of the probability of success in a given binomial trial based on the number of successes, x, observed in n independent trials. If x = (x(1), x(2), ... x(k)) is a vector, binofit returns a vector of the same size as x whose ith entry is the parameter estimate for x(i). All k estimates are independent of each other. If n = (n(1), n(2), ..., n(k)) is a vector of the same size as x, the binomial fit, binofit, returns a vector whose ith entry is the parameter estimate based on the number of successes x(i) in n(i) independent trials. A scalar value for x or n is expanded to the same size as the other input. [phat,pci] = binofit(x,n) returns the probability estimate, phat, and the 95% confidence intervals, pci. binofit uses the Clopper-Pearson method to calculate confidence intervals. [phat,pci] = binofit(x,n,alpha) returns the 100(1 - alpha)% confidence intervals. For example, alpha = 0.01 yields 99% confidence intervals. Note binofit behaves differently than other Statistics and Machine Learning Toolbox functions that compute parameter estimates, in that it returns independent estimates for each entry of x. By comparison, expfit returns a single parameter estimate based on all the entries of x. Unlike most other distribution fitting functions, the binofit function treats its input x vector as a collection of measurements from separate samples. If you want to treat x as a single sample and compute a single parameter estimate for it, you can use binofit(sum(x),sum(n)) when n is a vector, and binofit(sum(X),N*length(X)) when n is a scalar.

Examples This example generates a binomial sample of 100 elements, where the probability of success in a given trial is 0.6, and then estimates this probability from the outcomes in the sample. r = binornd(100,0.6); [phat,pci] = binofit(r,100) phat = 0.5800 pci = 0.4771 0.6780

The 95% confidence interval, pci, contains the true value, 0.6. 32-158

binofit

References [1] Johnson, N. L., S. Kotz, and A. W. Kemp. Univariate Discrete Distributions. Hoboken, NJ: WileyInterscience, 1993.

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also binocdf | binoinv | binopdf | binornd | binostat | mle Topics “Binomial Distribution” on page B-10 Introduced before R2006a

32-159

32

Functions

binoinv Binomial inverse cumulative distribution function

Syntax X = binoinv(Y,N,P)

Description X = binoinv(Y,N,P) returns the smallest integer X such that the binomial cdf evaluated at X is equal to or exceeds Y. You can think of Y as the probability of observing X successes in N independent trials where P is the probability of success in each trial. Each X is a positive integer less than or equal to N. Y, N, and P can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other inputs. The parameters in N must be positive integers, and the values in both P and Y must lie on the interval [0 1].

Examples If a baseball team has a 50-50 chance of winning any game, what is a reasonable range of games this team might win over a season of 162 games? binoinv([0.05 0.95],162,0.5) ans = 71 91

This result means that in 90% of baseball seasons, a .500 team should win between 71 and 91 games.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also binocdf | binofit | binopdf | binornd | binostat | icdf Topics “Binomial Distribution” on page B-10 Introduced before R2006a 32-160

binopdf

binopdf Binomial probability density function

Syntax y = binopdf(x,n,p)

Description y = binopdf(x,n,p) computes the binomial probability density function at each of the values in x using the corresponding number of trials in n and probability of success for each trial in p. x, n, and p can be vectors, matrices, or multidimensional arrays of the same size. Alternatively, one or more arguments can be scalars. The binopdf function expands scalar inputs to constant arrays with the same dimensions as the other inputs.

Examples Compute and Plot Binomial Probability Density Function Compute and plot the binomial probability density function for the specified range of integer values, number of trials, and probability of success for each trial. In one day, a quality assurance inspector tests 200 circuit boards. 2% of the boards have defects. Compute the probability that the inspector will find no defective boards on any given day. binopdf(0,200,0.02) ans = 0.0176

Compute the binomial probability density function values at each value from 0 to 200. These values correspond to the probabilities that the inspector will find 0, 1, 2, ..., 200 defective boards on any given day. defects = 0:200; y = binopdf(defects,200,.02);

Plot the resulting binomial probability values. plot(defects,y)

32-161

32

Functions

Compute the most likely number of defective boards that the inspector finds in a day. [x,i] = max(y); defects(i) ans = 4

Input Arguments x — Values at which to evaluate binomial pdf integer from interval [0 n] | array of integers from interval [0 n] Values at which to evaluate the binomial pdf, specified as an integer or an array of integers. All values of x must belong to the interval [0 n], where n is the number of trials. Example: [0,1,3,4] Data Types: single | double n — Number of trials positive integer | array of positive integers Number of trials, specified as a positive integer or an array of positive integers. Example: [10,20,50,100] Data Types: single | double 32-162

binopdf

p — Probability of success for each trial scalar value from interval [0 1] | array of scalar values from interval [0 1] Probability of success for each trial, specified as a scalar value or an array of scalar values. All values of p must belong to the interval [0 1]. Example: [0.01,0.1,0.5,0.7] Data Types: single | double

Output Arguments y — Binomial pdf values scalar value | array of scalar values Binomial pdf values, returned as a scalar value or array of scalar values. Each element in y is the binomial pdf value of the distribution evaluated at the corresponding element in x. Data Types: single | double

More About Binomial Probability Density Function The binomial probability density function lets you obtain the probability of observing exactly x successes in n trials, with the probability p of success on a single trial. The binomial probability density function for a given value x and given pair of parameters n and p is y = f (x n, p) =

n x (n − x) p q I(0, 1, ..., n)(x) x

where q = 1 – p. The resulting value y is the probability of observing exactly x successes in n independent trials, where the probability of success in any given trial is p. The indicator function I(0,1,...,n)(x) ensures that x only adopts values of 0, 1, ..., n.

Alternative Functionality • binopdf is a function specific to binomial distribution. Statistics and Machine Learning Toolbox also offers the generic function pdf, which supports various probability distributions. To use pdf, specify the probability distribution name and its parameters. Alternatively, create a BinomialDistribution probability distribution object and pass the object as an input argument. Note that the distribution-specific function binopdf is faster than the generic function pdf. • Use the Probability Distribution Function app to create an interactive plot of the cumulative distribution function (cdf) or probability density function (pdf) for a probability distribution.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. 32-163

32

Functions

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also BinomialDistribution | binocdf | binofit | binoinv | binornd | binostat | pdf Topics “Binomial Distribution” on page B-10 Introduced before R2006a

32-164

binornd

binornd Random numbers from binomial distribution

Syntax r = binornd(n,p) r = binornd(n,p,sz1,...,szN) r = binornd(n,p,sz)

Description r = binornd(n,p) generates random numbers from the binomial distribution specified by the number of trials n and the probability of success for each trial p. n and p can be vectors, matrices, or multidimensional arrays of the same size. Alternatively, one or more arguments can be scalars. The binornd function expands scalar inputs to constant arrays with the same dimensions as the other inputs. The function returns a vector, matrix, or multidimensional array r of the same size as n and p. r = binornd(n,p,sz1,...,szN) generates an array of random numbers from the binomial distribution with the scalar parameters n and p, where sz1,...,szN indicates the size of each dimension. r = binornd(n,p,sz) generates an array of random numbers from the binomial distribution with the scalar parameters n and p, where vector sz specifies size(r).

Examples Array of Random Numbers from Several Binomial Distributions Generate an array of random numbers from the binomial distributions. For each distribution, you specify the number of trials and the probability of success for each trial. Specify the numbers of trials. n = 10:10:60 n = 1×6 10

20

30

40

50

60

Specify the probabilities of success for each trial. p = 1./n p = 1×6 0.1000

0.0500

0.0333

0.0250

0.0200

0.0167

32-165

32

Functions

Generate random numbers from the binomial distributions. r = binornd(n,p) r = 1×6 0

1

1

0

1

1

Array of Random Numbers from One Binomial Distribution Generate an array of random numbers from one binomial distribution. Here, the distribution parameters n and p are scalars. Use the binornd function to generate random numbers from the binomial distribution with 100 trials, where the probability of success in each trial is 0.2. The function returns one number. r_scalar = binornd(100,0.2) r_scalar = 20

Generate a 2-by-3 array of random numbers from the same distribution by specifying the required array dimensions. r_array = binornd(100,0.2,2,3) r_array = 2×3 18 18

23 24

20 23

Alternatively, specify the required array dimensions as a vector. r_array = binornd(100,0.2,[2 3]) r_array = 2×3 21 26

21 18

20 23

Input Arguments n — Number of trials positive integer | array of positive integers Number of trials, specified as a positive integer or an array of positive integers. Example: [10 20 50 100] Data Types: single | double p — Probability of success for each trial scalar value | array of scalar values 32-166

binornd

Probability of success for each trial, specified as a scalar value or an array of scalar values. All values of p must belong to the interval [0 1]. Example: [0.01 0.1 0.5 0.7] Data Types: single | double sz1,...,szN — Size of each dimension (as separate arguments) integers Size of each dimension, specified as separate arguments of integers. For example, specifying 5,3,2 generates a 5-by-3-by-2 array of random numbers from the binomial probability distribution. If either n or p is an array, then the specified dimensions sz1,...,szN must match the common dimensions of n and p after any necessary scalar expansion. The default values of sz1,...,szN are the common dimensions. • If you specify a single value sz1, then r is a square matrix of size sz1-by-sz1. • If the size of any dimension is 0 or negative, then r is an empty array. • Beyond the second dimension, binornd ignores trailing dimensions with a size of 1. For example, binornd(n,p,3,1,1,1) produces a 3-by-1 vector of random numbers. Example: 5,3,2 Data Types: single | double sz — Size of each dimension (as a row vector) row vector of integers Size of each dimension, specified as a row vector of integers. For example, specifying [5 3 2] generates a 5-by-3-by-2 array of random numbers from the binomial probability distribution. If either n or p is an array, then the specified dimensions sz must match the common dimensions of n and p after any necessary scalar expansion. The default values of sz are the common dimensions. • If you specify a single value [sz1], then r is a square matrix of size sz1-by-sz1. • If the size of any dimension is 0 or negative, then r is an empty array. • Beyond the second dimension, binornd ignores trailing dimensions with a size of 1. For example, binornd(n,p,[3 1 1 1]) produces a 3-by-1 vector of random numbers. Example: [5 3 2] Data Types: single | double

Output Arguments r — Random numbers from binomial distribution scalar value | array of scalar values Random numbers from the binomial distribution, returned as a scalar value or an array of scalar values. Data Types: single | double 32-167

32

Functions

Alternative Functionality • binornd is a function specific to binomial distribution. Statistics and Machine Learning Toolbox also offers the generic function random, which supports various probability distributions. To use random, specify the probability distribution name and its parameters. Alternatively, create a BinomialDistribution probability distribution object and pass the object as an input argument. Note that the distribution-specific function binornd is faster than the generic function random. • To generate random numbers interactively, use randtool, a user interface for random number generation.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The generated code can return a different sequence of numbers than MATLAB in these two cases: • The output is nonscalar. • An input parameter is invalid for the distribution. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). Distributed Arrays Partition large arrays across the combined memory of your cluster using Parallel Computing Toolbox™. This function fully supports distributed arrays. For more information, see “Run MATLAB Functions with Distributed Arrays” (Parallel Computing Toolbox).

See Also BinomialDistribution | binocdf | binofit | binoinv | binopdf | binostat | random Topics “Binomial Distribution” on page B-10 Introduced before R2006a

32-168

binostat

binostat Binomial mean and variance

Syntax [M,V] = binostat(N,P)

Description [M,V] = binostat(N,P) returns the mean of and variance for the binomial distribution with parameters specified by the number of trials, N, and probability of success for each trial, P. N and P can be vectors, matrices, or multidimensional arrays that have the same size, which is also the size of M and V. A scalar input for N or P is expanded to a constant array with the same dimensions as the other input. The mean of the binomial distribution with parameters n and p is np. The variance is npq, where q = 1 – p.

Examples n = logspace(1,5,5) n = 10 100 1000 [m,v] = binostat(n,1./n) m = 1 1 1 1 1 v = 0.9000 0.9900 0.9990 [m,v] = binostat(n,1/2) m = 5 50 500 v = 1.0e+04 * 0.0003 0.0025 0.0250

10000

100000

0.9999

1.0000

5000

50000

0.2500

2.5000

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also binocdf | binofit | binoinv | binopdf | binornd 32-169

32

Functions

Topics “Binomial Distribution” on page B-10 Introduced before R2006a

32-170

binScatterPlot

binScatterPlot Scatter plot of bins for tall arrays

Syntax binScatterPlot(X,Y) binScatterPlot(X,Y,nbins) binScatterPlot(X,Y,Xedges,Yedges) binScatterPlot(X,Y,Name,Value) h = binScatterPlot( ___ )

Description binScatterPlot(X,Y) creates a binned scatter plot of the data in X and Y. The binScatterPlot function uses an automatic binning algorithm that returns bins with a uniform area, chosen to cover the range of elements in X and Y and reveal the underlying shape of the distribution. binScatterPlot(X,Y,nbins) specifies the number of bins to use in each dimension. binScatterPlot(X,Y,Xedges,Yedges) specifies the edges of the bins in each dimension using the vectors Xedges and Yedges. binScatterPlot(X,Y,Name,Value) specifies additional options with one or more name-value pair arguments using any of the previous syntaxes. For example, you can specify 'Color' and a valid color option to change the color theme of the plot, or 'Gamma' with a positive scalar to adjust the level of detail. h = binScatterPlot( ___ ) returns a Histogram2 object. Use this object to inspect properties of the plot.

Examples Binned Scatter Plot of Normally-Distributed Random Data Create two tall vectors of random data. Create a binned scatter plot for the data. X = tall(randn(1e5,1)); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 12). Y = tall(randn(1e5,1));

When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the mapreducer function. binScatterPlot(X,Y)

32-171

32

Functions

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 12 sec Evaluation completed in 20 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 2.9 sec Evaluation completed in 3.6 sec

The resulting figure contains a slider to adjust the level of detail in the image.

Specify Number of Scatter Plot Bins Specify a scalar value as the third input argument to use the same number of bins in each dimension, or a two-element vector to use a different number of bins in each dimension. Plot a binned scatter plot of random data sorted into 100 bins in each dimension. X = tall(randn(1e5,1)); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 12). Y = tall(randn(1e5,1));

When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example 32-172

binScatterPlot

using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the mapreducer function. binScatterPlot(X,Y,100) Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 12 sec Evaluation completed in 16 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 3.4 sec Evaluation completed in 4.4 sec

Use 20 bins in the x-dimension and continue to use 100 bins in the y-dimension. binScatterPlot(X,Y,[20 100]) Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 3.1 sec Evaluation completed in 5.4 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 3.8 sec Evaluation completed in 4.5 sec

32-173

32

Functions

Specify Bin Edges for Scatter Plot Plot a binned scatter plot of random data with specific bin edges. Use bin edges of Inf and -Inf to capture outliers. Create a binned scatter plot with 100 bin edges between [-2 2] in each dimension. The data outside the specified bin edges is not included in the plot. X = tall(randn(1e5,1)); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 12). Y = tall(randn(1e5,1));

When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the mapreducer function. Xedges = linspace(-2,2); Yedges = linspace(-2,2); binScatterPlot(X,Y,Xedges,Yedges)

32-174

binScatterPlot

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 14 sec Evaluation completed in 19 sec

Use coarse bins extending to infinity on the edges of the plot to capture outliers. Xedges = [-Inf linspace(-2,2) Inf]; Yedges = [-Inf linspace(-2,2) Inf]; binScatterPlot(X,Y,Xedges,Yedges) Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 2.6 sec Evaluation completed in 3.5 sec

32-175

32

Functions

Adjust Plot Color Theme Plot a binned scatter plot of random data, specifying 'Color' as 'c'. X = tall(randn(1e5,1)); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 12). Y = tall(randn(1e5,1));

When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the mapreducer function. binScatterPlot(X,Y,'Color','c') Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 7.7 sec Evaluation completed in 12 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1.5 sec Evaluation completed in 1.9 sec

32-176

binScatterPlot

Input Arguments X,Y — Data to distribute among bins (as separate arguments) tall vectors | tall matrices | tall multidimensional arrays Data to distribute among bins, specified as separate arguments of tall vectors, matrices, or multidimensional arrays. X and Y must be the same size. If X and Y are not vectors, then binScatterPlot treats them as single column vectors, X(:) and Y(:). Corresponding elements in X and Y specify the x and y coordinates of 2-D data points, [X(k),Y(k)]. The underlying data types of X and Y can be different, but binScatterPlot concatenates these inputs into a single N-by-2 tall matrix of the dominant underlying data type. binScatterPlot ignores all NaN values. Similarly, binScatterPlot ignores Inf and -Inf values, unless the bin edges explicitly specify Inf or -Inf as a bin edge. Note If X or Y contain integers of type int64 or uint64 that are larger than flintmax, then it is recommended that you explicitly specify the bin edges.binScatterPlot automatically bins the input data using double precision, which lacks integer precision for numbers greater than flintmax. Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical 32-177

32

Functions

nbins — Number of bins in each dimension scalar | vector Number of bins in each dimension, specified as a positive scalar integer or two-element vector of positive integers. If you do not specify nbins, then binScatterPlot automatically calculates how many bins to use based on the values in X and Y. • If nbins is a scalar, then binScatterPlot uses that many bins in each dimension. • If nbins is a vector, then nbins(1) specifies the number of bins in the x-dimension and nbins(2) specifies the number of bins in the y-dimension. Example: binScatterPlot(X,Y,20) uses 20 bins in each dimension. Example: binScatterPlot(X,Y,[10 20]) uses 10 bins in the x-dimension and 20 bins in the ydimension. Xedges — Bin edges in x-dimension vector Bin edges in x-dimension, specified as a vector. Xedges(1) is the first edge of the first bin in the xdimension, and Xedges(end) is the outer edge of the last bin. The value [X(k),Y(k)] is in the (i,j)th bin if Xedges(i) ≤ X(k) < Xedges(i+1) and Yedges(j) ≤ Y(k) < Yedges(j+1). The last bins in each dimension also include the last (outer) edge. For example, [X(k),Y(k)] falls into the ith bin in the last row if Xedges(end-1) ≤ X(k) ≤ Xedges(end) and Yedges(i) ≤ Y(k) < Yedges(i+1). Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical Yedges — Bin edges in y-dimension vector Bin edges in y-dimension, specified as a vector. Yedges(1) is the first edge of the first bin in the ydimension, and Yedges(end) is the outer edge of the last bin. The value [X(k),Y(k)] is in the (i,j)th bin if Xedges(i) ≤ X(k) < Xedges(i+1) and Yedges(j) ≤ Y(k) < Yedges(j+1). The last bins in each dimension also include the last (outer) edge. For example, [X(k),Y(k)] falls into the ith bin in the last row if Xedges(end-1) ≤ X(k) ≤ Xedges(end) and Yedges(i) ≤ Y(k) < Yedges(i+1). Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: binScatterPlot(X,Y,'BinWidth',[5 10]) BinMethod — Binning algorithm 'auto' (default) | 'scott' | 'integers' Binning algorithm, specified as the comma-separated pair consisting of 'BinMethod' and one of these values. 32-178

binScatterPlot

Value

Description

'auto'

The default 'auto' algorithm uses a maximum of 100 bins and chooses a bin width to cover the data range and reveal the shape of the underlying distribution.

'scott'

Scott’s rule is optimal if the data is close to being jointly normally distributed. This rule is appropriate for most other distributions, as well. It uses a bin size of [3.5*std(X)*numel(X)^(-1/4), 3.5*std(Y)*numel(Y)^(-1/4)].

'integers'

The integer rule is useful with integer data, as it creates a bin for each integer. It uses a bin width of 1 and places bin edges halfway between integers. To avoid accidentally creating too many bins, you can use this rule to create a limit of 65536 bins (216). If the data range is greater than 65536, then the integer rule uses wider bins instead.

Note The BinMethod property of the resulting Histogram2 object always has a value of 'manual'. BinWidth — Width of bins in each dimension scalar | vector Width of bins in each dimension, specified as the comma-separated pair consisting of 'BinWidth' and a scalar or two-element vector of positive integers, [xWidth yWidth]. A scalar value indicates the same bin width for each dimension. If you specify BinWidth, then binScatterPlot can use a maximum of 1024 bins (210) along each dimension. If instead the specified bin width requires more bins, then binScatterPlot uses a larger bin width corresponding to the maximum number of bins. Example: binScatterPlot(X,Y,'BinWidth',[5 10]) uses bins with size 5 in the x-dimension and size 10 in the y-dimension. Color — Plot color theme 'b' (default) | 'y' | 'm' | 'c' | 'r' | 'g' | 'k' Plot color theme, specified as the comma-separated pair consisting of 'Color' and one of these options. Option

Description

'b'

Blue

'm'

Magenta

'c'

Cyan

'r'

Red

'g'

Green

32-179

32

Functions

Option

Description

'y'

Yellow

'k'

Black

Gamma — Gamma correction 1 (default) | positive scalar Gamma correction, specified as the comma-separated pair consisting of 'Gamma' and a positive scalar. Use this option to adjust the brightness and color intensity to affect the amount of detail in the image. • gamma < 1 — As gamma decreases, the shading of bins with smaller bin counts becomes progressively darker, including more detail in the image. • gamma > 1 — As gamma increases, the shading of bins with smaller bin counts becomes progressively lighter, removing detail from the image. • The default value of 1 does not apply any correction to the display. XBinLimits — Bin limits in x-dimension vector Bin limits in x-dimension, specified as the comma-separated pair consisting of 'XBinLimits' and a two-element vector, [xbmin,xbmax]. The vector indicates the first and last bin edges in the xdimension. binScatterPlot only plots data that falls within the bin limits inclusively, Data(Data(:,1)>=xbmin & Data(:,1)=ybmin & Data(:,2) 0; X = data(~nans,1:3); Y = data(~nans,4:5);

Compute the sample canonical correlation. [A,B,r,U,V] = canoncorr(X,Y);

View the output of A to determine the linear combinations of displacement, horsepower, and weight that make up the canonical variables of X. A

32-239

32

Functions

A = 3×2 0.0025 0.0202 -0.0000

0.0048 0.0409 -0.0027

A(3,1) is displayed as —0.000 because it is very small. Display A(3,1) separately. A(3,1) ans = -2.4737e-05

The first canonical variable of X is u1 = 0.0025*Disp + 0.0202*HP — 0.000025*Wgt. The second canonical variable of X is u2 = 0.0048*Disp + 0.0409*HP — 0.0027*Wgt. View the output of B to determine the linear combinations of acceleration and MPG that make up the canonical variables of Y. B B = 2×2 -0.1666 -0.0916

-0.3637 0.1078

The first canonical variable of Y is v1 = —0.1666*Accel — 0.0916*MPG. The second canonical variable of Y is v2 = —0.3637*Accel + 0.1078*MPG. Plot the scores of the canonical variables of X and Y against each other. t = tiledlayout(2,2); title(t,'Canonical Scores of X vs Canonical Scores of Y') xlabel(t,'Canonical Variables of X') ylabel(t,'Canonical Variables of Y') t.TileSpacing = 'compact'; nexttile plot(U(:,1),V(:,1),'.') xlabel('u1') ylabel('v1') nexttile plot(U(:,2),V(:,1),'.') xlabel('u2') ylabel('v1') nexttile plot(U(:,1),V(:,2),'.') xlabel('u1') ylabel('v2') nexttile plot(U(:,2),V(:,2),'.') xlabel('u2') ylabel('v2')

32-240

canoncorr

The pairs of canonical variables {ui, vi} are ordered from the strongest to weakest correlation, with all other pairs independent. Return the correlation coefficient of the variables u1 and v1. r(1) ans = 0.8782

Input Arguments X — Input matrix matrix Input matrix, specified as an n-by-d1 matrix. The rows of X correspond to observations, and the columns correspond to variables. Data Types: single | double Y — Input matrix matrix Input matrix, specified as an n-by-d2 matrix where X is an n-by-d1 matrix. The rows of Y correspond to observations, and the columns correspond to variables. Data Types: single | double 32-241

32

Functions

Output Arguments A — Sample canonical coefficients for X variables matrix Sample canonical coefficients for the variables in X, returned as a d1-by-d matrix, where d = min(rank(X),rank(Y)). The jth column of A contains the linear combination of variables that makes up the jth canonical variable for X. If X is less than full rank, canoncorr gives a warning and returns zeros in the rows of A corresponding to dependent columns of X. B — Sample canonical coefficients for Y variables matrix Sample canonical coefficients for the variables in Y, returned as a d2-by-d matrix, where d = min(rank(X),rank(Y)). The jth column of B contains the linear combination of variables that makes up the jth canonical variable for Y. If Y is less than full rank, canoncorr gives a warning and returns zeros in the rows of B corresponding to dependent columns of Y. r — Sample canonical correlations vector Sample canonical correlations, returned as a 1-by-d vector, where d = min(rank(X),rank(Y)). The jth element of r is the correlation between the jth columns of U and V. U — Canonical scores for the X variables matrix Canonical scores for the variables in X, returned as an n-by-d matrix, where X is an n-by-d1 matrix and d = min(rank(X),rank(Y)). V — Canonical scores for the Y variables matrix Canonical scores for the variables in Y, returned as an n-by-d matrix, where Y is an n-by-d2 matrix and d = min(rank(X),rank(Y)). stats — Hypothesis test information structure Hypothesis test information, returned as a structure. This information relates to the sequence of d (k)

null hypotheses H0 that the (k+1)st through dth correlations are all zero for k=1,…,d-1, and d = min(rank(X),rank(Y)). The fields of stats are 1-by-d vectors with elements corresponding to the values of k. 32-242

canoncorr

Field

Description

Wilks

Wilks' lambda (likelihood ratio) statistic

df1

Degrees of freedom for the chi-squared statistic, and the numerator degrees of freedom for the F statistic

df2

Denominator degrees of freedom for the F statistic

F

Rao's approximate F statistic for H0

pF

Right-tail significance level for F

chisq

Bartlett's approximate chi-squared statistic for H0 with Lawley's modification

pChisq

Right-tail significance level for chisq

(k)

(k)

stats has two other fields (dfe and p), which are equal to df1 and pChisq, respectively, and exist for historical reasons. Data Types: struct

More About Canonical Correlation Analysis The canonical scores of the data matrices X and Y are defined as Ui = Xai V i = Ybi where ai and bi maximize the Pearson correlation coefficient ρ(Ui,Vi) subject to being uncorrelated to all previous canonical scores and scaled so that Ui and Vi have zero mean and unit variance. The canonical coefficients of X and Y are the matrices A and B with columns ai and bi, respectively. The canonical variables of X and Y are the linear combinations of the columns of X and Y given by the canonical coefficients in A and B respectively. The canonical correlations are the values ρ(Ui,Vi) measuring the correlation of each pair of canonical variables of X and Y.

Algorithms canoncorr computes A, B, and r using qr and svd. canoncorr computes U and V as U = (X— mean(X))*A and V = (Y—mean(Y))*B.

References [1] Krzanowski, W. J. Principles of Multivariate Analysis: A User's Perspective. New York: Oxford University Press, 1988. [2] Seber, G. A. F. Multivariate Observations. Hoboken, NJ: John Wiley & Sons, Inc., 1984. 32-243

32

Functions

See Also manova1 | pca Introduced before R2006a

32-244

capability

capability Process capability indices

Syntax S = capability(data,specs)

Description S = capability(data,specs) estimates capability indices for measurements in data given the specifications in specs. data can be either a vector or a matrix of measurements. If data is a matrix, indices are computed for the columns. specs can be either a two-element vector of the form [L,U] containing lower and upper specification limits, or (if data is a matrix) a two-row matrix with the same number of columns as data. If there is no lower bound, use -Inf as the first element of specs. If there is no upper bound, use Inf as the second element of specs. The output S is a structure with the following fields: • mu — Sample mean • sigma — Sample standard deviation • P — Estimated probability of being within limits • Pl — Estimated probability of being below L • Pu — Estimated probability of being above U • Cp — (U-L)/(6*sigma) • Cpl — (mu-L)./(3.*sigma) • Cpu — (U-mu)./(3.*sigma) • Cpk — min(Cpl,Cpu) Indices are computed under the assumption that data values are independent samples from a normal population with constant mean and variance. Indices divide a “specification width” (between specification limits) by a “process width” (between control limits). Higher ratios indicate a process with fewer measurements outside of specification.

Examples Compute Capability Indices Simulate a sample from a process with a mean of 3 and a standard deviation of 0.005. rng default; % for reproducibility data = normrnd(3,0.005,100,1);

Compute capability indices if the process has an upper specification limit of 3.01 and a lower specification limit of 2.99. S = capability(data,[2.99 3.01])

32-245

32

Functions

S = struct mu: sigma: P: Pl: Pu: Cp: Cpl: Cpu: Cpk:

with fields: 3.0006 0.0058 0.9129 0.0339 0.0532 0.5735 0.6088 0.5382 0.5382

Visualize the specification and process widths. capaplot(data,[2.99 3.01]); grid on

References [1] Montgomery, D. Introduction to Statistical Quality Control. Hoboken, NJ: John Wiley & Sons, 1991, pp. 369–374.

See Also capaplot | histfit 32-246

capability

Introduced in R2006b

32-247

32

Functions

capaplot Process capability plot

Syntax p = capaplot(data,specs) [p,h] = capaplot(data,specs)

Description p = capaplot(data,specs) estimates the mean of and variance for the observations in input vector data, and plots the pdf of the resulting T distribution. The observations in data are assumed to be normally distributed. The output, p, is the probability that a new observation from the estimated distribution will fall within the range specified by the two-element vector specs. The portion of the distribution between the lower and upper bounds specified in specs is shaded in the plot. [p,h] = capaplot(data,specs) additionally returns handles to the plot elements in h. capaplot treats NaN values in data as missing, and ignores them.

Examples Create a Process Capability Plot Randomly generate sample data from a normal process with a mean of 3 and a standard deviation of 0.005. rng default; % For reproducibility data = normrnd(3,0.005,100,1);

Compute capability indices if the process has an upper specification limit of 3.01 and a lower specification limit of 2.99. S = capability(data,[2.99 3.01]) S = struct mu: sigma: P: Pl: Pu: Cp: Cpl: Cpu: Cpk:

with fields: 3.0006 0.0058 0.9129 0.0339 0.0532 0.5735 0.6088 0.5382 0.5382

Visualize the specification and process widths. capaplot(data,[2.99 3.01]); grid on

32-248

capaplot

See Also capability | histfit Introduced before R2006a

32-249

32

Functions

caseread Read case names from file

Syntax names = caseread(filename) names = caseread

Description names = caseread(filename) reads the contents of filename and returns a character array names. The caseread function treats each line of the file as a separate case name. Specify filename as either the name of a file in the current folder or the complete path name of a file. filename can have one of the following file extensions: • .txt, .dat, or .csv for delimited text files • .xls, .xlsm, or .xlsx for Excel spreadsheet files names = caseread opens the Select File to Open dialog box so that you can interactively select the file to read.

Examples Write and Read Case Names Create a character array of case names representing months. months = char('January','February', ... 'March','April','May');

Write the names to a file named months.dat. View the contents of the file by using the type function. casewrite(months,'months.dat') type months.dat January February March April May

Read the names in the months.dat file. names = caseread('months.dat') names = 5x8 char array 'January ' 'February' 'March '

32-250

caseread

'April 'May

' '

Input Arguments filename — Name of file to read character vector | string scalar Name of the file to read, specified as a character vector or string scalar. Depending on the location of the file, filename has one of these forms. Location of File

Form

Current folder or folder on the MATLAB path

Specify the name of the file in filename. Example: 'myTextFile.csv'

Folder that is not the current folder or a folder on Specify the full or relative path name in the MATLAB path filename. Example: 'C:\myFolder\myTextFile.csv' Example: 'months.dat' Data Types: char | string

Alternative Functionality Instead of using casewrite and caseread with character arrays, consider using writecell and readcell with cell arrays. For example: months = {'January';'February';'March';'April';'May'}; writecell(months,'months.dat') names = readcell('months.dat') names = 5×1 cell array {'January' } {'February'} {'March' } {'April' } {'May' }

See Also casewrite | gname | readcell | readtable | writecell | writetable Introduced before R2006a

32-251

32

Functions

casewrite Write case names to file

Syntax casewrite(strmat,filename) casewrite(strmat)

Description casewrite(strmat,filename) writes the contents of the character array or string column vector strmat to a file filename. Each row of strmat represents one case name, and casewrite writes each name to a separate line in filename. Specify filename as either a file name (to write the file to the current folder) or a complete path name (to write the file to a different folder). filename can have one of the following file extensions: • .txt, .dat, or .csv for delimited text files • .xls, .xlsm, or .xlsx for Excel spreadsheet files casewrite(strmat) opens the Select File to Write dialog box so that you can interactively specify the file to write.

Examples Write and Read Case Names Create a character array of case names representing months. months = char('January','February', ... 'March','April','May');

Write the names to a file named months.dat. View the contents of the file by using the type function. casewrite(months,'months.dat') type months.dat January February March April May

Read the names in the months.dat file. names = caseread('months.dat') names = 5x8 char array 'January ' 'February'

32-252

casewrite

'March 'April 'May

' ' '

Input Arguments strmat — Case names character array | string column vector Case names, specified as a character array or string column vector. Each row of strmat corresponds to a case name and becomes a line in filename. Data Types: char | string filename — Name of file to write character vector | string scalar Name of the file to write, specified as a character vector or string scalar. Depending on the location you are writing to, filename has one of these forms. Location of File

Form

Current folder

Specify the name of the file in filename. Example: 'myTextFile.csv'

Folder that is different from the current folder

Specify the full or relative path name in filename. Example: 'C:\myFolder\myTextFile.csv'

Example: 'months.dat' Data Types: char | string

Alternative Functionality Instead of using casewrite and caseread with character arrays, consider using writecell and readcell with cell arrays. For example: months = {'January';'February';'March';'April';'May'}; writecell(months,'months.dat') names = readcell('months.dat') names = 5×1 cell array {'January' } {'February'} {'March' } {'April' } {'May' }

32-253

32

Functions

See Also caseread | gname | readcell | readtable | writecell | writetable Introduced before R2006a

32-254

DaviesBouldinEvaluation

DaviesBouldinEvaluation Package: clustering.evaluation Superclasses: clustering.evaluation.ClusterCriterion Davies-Bouldin criterion clustering evaluation object

Description DaviesBouldinEvaluation is an object consisting of sample data, clustering data, and DaviesBouldin criterion values used to evaluate the optimal number of clusters. Create a Davies-Bouldin criterion clustering evaluation object using evalclusters.

Construction eva = evalclusters(x,clust,'DaviesBouldin') creates a Davies-Bouldin criterion clustering evaluation object. eva = evalclusters(x,clust,'DaviesBouldin',Name,Value) creates a Davies-Bouldin criterion clustering evaluation object using additional options specified by one or more name-value pair arguments. Input Arguments x — Input data matrix Input data, specified as an N-by-P matrix. N is the number of observations, and P is the number of variables. Data Types: single | double clust — Clustering algorithm 'kmeans' | 'linkage' | 'gmdistribution' | matrix of clustering solutions | function handle Clustering algorithm, specified as one of the following. 'kmeans'

Cluster the data in x using the kmeans clustering algorithm, with 'EmptyAction' set to 'singleton' and 'Replicates' set to 5.

'linkage'

Cluster the data in x using the clusterdata agglomerative clustering algorithm, with 'Linkage' set to 'ward'.

'gmdistribution'

Cluster the data in x using the gmdistribution Gaussian mixture distribution algorithm, with 'SharedCov' set to true and 'Replicates' set to 5.

If criterion is 'CalinskiHarabasz', 'DaviesBouldin', or 'silhouette', you can specify a clustering algorithm using a function handle (MATLAB). The function must be of the form C = clustfun(DATA,K), where DATA is the data to be clustered, and K is the number of clusters. The output of clustfun must be one of the following: 32-255

32

Functions

• A vector of integers representing the cluster index for each observation in DATA. There must be K unique values in this vector. • A numeric n-by-K matrix of score for n observations and K classes. In this case, the cluster index for each observation is determined by taking the largest score value in each row. If criterion is 'CalinskiHarabasz', 'DaviesBouldin', or 'silhouette', you can also specify clust as a n-by-K matrix containing the proposed clustering solutions. n is the number of observations in the sample data, and K is the number of proposed clustering solutions. Column j contains the cluster indices for each of the N points in the jth clustering solution. Data Types: single | double | char | string | function_handle Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'KList',[1:5] specifies to test 1, 2, 3, 4, and 5 clusters to find the optimal number. KList — List of number of clusters to evaluate vector List of number of clusters to evaluate, specified as the comma-separated pair consisting of 'KList' and a vector of positive integer values. You must specify KList when clust is a clustering algorithm name or a function handle. When criterion is 'gap', clust must be a character vector, a string scalar, or a function handle, and you must specify KList. Example: 'KList',[1:6] Data Types: single | double

Properties ClusteringFunction Clustering algorithm used to cluster the input data, stored as a valid clustering algorithm name or function handle. If the clustering solutions are provided in the input, ClusteringFunction is empty. CriterionName Name of the criterion used for clustering evaluation, stored as a valid criterion name. CriterionValues Criterion values corresponding to each proposed number of clusters in InspectedK, stored as a vector of numerical values. InspectedK List of the number of proposed clusters for which to compute criterion values, stored as a vector of positive integer values.

32-256

DaviesBouldinEvaluation

Missing Logical flag for excluded data, stored as a column vector of logical values. If Missing equals true, then the corresponding value in the data matrix x is not used in the clustering solution. NumObservations Number of observations in the data matrix X, minus the number of missing (NaN) values in X, stored as a positive integer value. OptimalK Optimal number of clusters, stored as a positive integer value. OptimalY Optimal clustering solution corresponding to OptimalK, stored as a column vector of positive integer values. If the clustering solutions are provided in the input, OptimalY is empty. X Data used for clustering, stored as a matrix of numerical values.

Methods Inherited Methods addK

Evaluate additional numbers of clusters

plot

Plot clustering evaluation object criterion values

compact

Compact clustering evaluation object

Examples Evaluate the Clustering Solution Using Davies-Bouldin Criterion Evaluate the optimal number of clusters using the Davies-Bouldin clustering evaluation criterion. Generate sample data containing random numbers from three multivariate distributions with different parameter values. rng('default'); % For reproducibility mu1 = [2 2]; sigma1 = [0.9 -0.0255; -0.0255 0.9]; mu2 = [5 5]; sigma2 = [0.5 0 ; 0 0.3]; mu3 = [-2, -2]; sigma3 = [1 0 ; 0 0.9]; N = 200; X = [mvnrnd(mu1,sigma1,N);...

32-257

32

Functions

mvnrnd(mu2,sigma2,N);... mvnrnd(mu3,sigma3,N)];

Evaluate the optimal number of clusters using the Davies-Bouldin criterion. Cluster the data using kmeans. E = evalclusters(X,'kmeans','DaviesBouldin','klist',[1:6]) E = DaviesBouldinEvaluation with properties: NumObservations: InspectedK: CriterionValues: OptimalK:

600 [1 2 3 4 5 6] [NaN 0.4663 0.4454 0.8316 1.0444 0.9236] 3

The OptimalK value indicates that, based on the Davies-Bouldin criterion, the optimal number of clusters is three. Plot the Davies-Bouldin criterion values for each number of clusters tested. figure; plot(E)

The plot shows that the lowest Davies-Bouldin value occurs at three clusters, suggesting that the optimal number of clusters is three. 32-258

DaviesBouldinEvaluation

Create a grouped scatter plot to visually examine the suggested clusters. figure; gscatter(X(:,1),X(:,2),E.OptimalY,'rbg','xod')

The plot shows three distinct clusters within the data: Cluster 1 is in the lower-left corner, cluster 2 is in the upper-right corner, and cluster 3 is near the center of the plot.

More About Davies-Bouldin Criterion The Davies-Bouldin criterion is based on a ratio of within-cluster and between-cluster distances. The Davies-Bouldin index is defined as DB =

k

1 max j ≠ i Di, j , ki∑ =1

where Di,j is the within-to-between cluster distance ratio for the ith and jth clusters. In mathematical terms, Di, j =

di + d j . di, j 32-259

32

Functions

d i is the average distance between each point in the ith cluster and the centroid of the ith cluster. d j is the average distance between each point in the jth cluster and the centroid of the jth cluster. di, j is the Euclidean distance between the centroids of the ith and jth clusters. The maximum value of Di,j represents the worst-case within-to-between cluster ratio for cluster i. The optimal clustering solution has the smallest Davies-Bouldin index value.

References [1] Davies, D. L., and D. W. Bouldin. “A Cluster Separation Measure.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. PAMI-1, No. 2, 1979, pp. 224–227.

See Also CalinskiHarabaszEvaluation | GapEvaluation | SilhouetteEvaluation | evalclusters Topics Class Attributes (MATLAB) Property Attributes (MATLAB)

32-260

cat

cat Class: dataset (Not Recommended) Concatenate dataset arrays Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax ds = cat(dim, ds1, ds2, ...)

Description ds = cat(dim, ds1, ds2, ...) concatenates the dataset arrays ds1, ds2, ... along dimension dim by calling the dataset/horzcat or dataset/vertcat method. dim must be 1 or 2.

See Also horzcat | vertcat

32-261

32

Functions

cdf Cumulative distribution function for Gaussian mixture distribution

Syntax y = cdf(gm,X)

Description y = cdf(gm,X) returns the cumulative distribution function (cdf) of the Gaussian mixture distribution gm, evaluated at the values in X.

Examples Compute cdf Values Create a gmdistribution object and compute its cdf values. Define the distribution parameters (means and covariances) of a two-component bivariate Gaussian mixture distribution. mu = [1 2;-3 -5]; sigma = [1 1]; % shared diagonal covariance matrix

Create a gmdistribution object by using the gmdistribution function. By default, the function creates an equal proportion mixture. gm = gmdistribution(mu,sigma) gm = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: 1 2 Component 2: Mixing proportion: 0.500000 Mean: -3 -5

Compute the cdf values of gm. X = [0 0;1 2;3 3;5 3]; cdf(gm,X) ans = 4×1 0.5011 0.6250 0.9111 0.9207

32-262

cdf

Plot cdf Create a gmdistribution object and plot its cdf. Define the distribution parameters (means, covariances, and mixing proportions) of two bivariate Gaussian mixture components. p = [0.4 0.6]; mu = [1 2;-3 -5]; sigma = cat(3,[2 .5],[1 1])

% Mixing proportions % Means % Covariances 1-by-2-by-2 array

sigma = sigma(:,:,1) = 2.0000

0.5000

sigma(:,:,2) = 1

1

The cat function concatenates the covariances along the third array dimension. The defined covariance matrices are diagonal matrices. sigma(1,:,i) contains the diagonal elements of the covariance matrix of component i. Create a gmdistribution object by using the gmdistribution function. gm = gmdistribution(mu,sigma,p) gm = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.400000 Mean: 1 2 Component 2: Mixing proportion: 0.600000 Mean: -3 -5

Plot the cdf of the Gaussian mixture distribution by using fsurf. fsurf(@(x,y)reshape(cdf(gm,[x(:) y(:)]),size(x)),[-10 10])

32-263

32

Functions

Input Arguments gm — Gaussian mixture distribution gmdistribution object Gaussian mixture distribution, also called Gaussian mixture model (GMM), specified as a gmdistribution object. You can create a gmdistribution object using gmdistribution or fitgmdist. Use the gmdistribution function to create a gmdistribution object by specifying the distribution parameters. Use the fitgmdist function to fit a gmdistribution model to data given a fixed number of components. X — Values at which to evaluate cdf n-by-m numeric matrix Values at which to evaluate the cdf, specified as an n-by-m numeric matrix, where n is the number of observations and m is the number of variables in each observation. Data Types: single | double

Output Arguments y — cdf values n-by-1 numeric vector 32-264

cdf

cdf values of the Gaussian mixture distribution gm, evaluated at X, returned as an n-by-1 numeric vector, where n is the number of observations in X.

See Also fitgmdist | gmdistribution | mvncdf | pdf | random Topics “Create Gaussian Mixture Model” on page 5-109 “Fit Gaussian Mixture Model to Data” on page 5-112 “Simulate Data from Gaussian Mixture Model” on page 5-116 “Cluster Using Gaussian Mixture Model” on page 16-39 Introduced in R2007b

32-265

32

Functions

ccdesign Central composite design

Syntax dCC = ccdesign(n) [dCC,blocks] = ccdesign(n) [...] = ccdesign(n,'Name',value)

Description dCC = ccdesign(n) generates a central composite design for n factors. n must be an integer 2 or larger. The output matrix dCC is m-by-n, where m is the number of runs in the design. Each row represents one run, with settings for all factors represented in the columns. Factor values are normalized so that the cube points take values between -1 and 1. [dCC,blocks] = ccdesign(n) requests a blocked design. The output blocks is an m-by-1 vector of block numbers for each run. Blocks indicate runs that are to be measured under similar conditions to minimize the effect of inter-block differences on the parameter estimates. [...] = ccdesign(n,'Name',value) specifies one or more optional name/value pairs for the design. Valid parameters and their values are listed in the following table. Specify Name in single quotes. Parameter

Description

Values

Value Description

center

Number of center points.

Integer

Number of center points to include.

'uniform'

Select number of center points to give uniform precision.

'orthogonal'

Select number of center points to give an orthogonal design. This is the default.

0

Whole design. Default when n ≤ 4.

1

1/2 fraction. Default when 4 < n ≤ 7 or n > 11.

2

1/4 fraction. Default when 7 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Data Types: char | function_handle W — Observation weights numeric vector Observation weights used to train the ECOC classifier, specified as a numeric vector. W has NumObservations elements. The software normalizes the weights used for training so that nansum(W) is 1. Data Types: single | double X — Unstandardized predictor data numeric matrix | table Unstandardized predictor data used to train the ECOC classifier, specified as a numeric matrix or table. Each row of X corresponds to one observation, and each column corresponds to one variable. Data Types: single | double | table Y — Observed class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Observed class labels used to train the ECOC classifier, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements and has the same data type as the input argument Y of fitcecoc. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. Data Types: categorical | char | logical | single | double | cell

32-327

32

Functions

Hyperparameter Optimization Properties HyperparameterOptimizationResults — Description of cross-validation optimization of hyperparameters BayesianOptimization object | table Description of the cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty if the 'OptimizeHyperparameters' name-value pair argument is nonempty when you create the model. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer field in the HyperparameterOptimizationOptions structure when you create the model, as described in this table. Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Object Functions compareHoldout compact crossval discardSupportVectors edge loss margin predict resubEdge resubLoss resubMargin resubPredict

Compare accuracies of two classification models using new data Reduce size of multiclass error-correcting output codes (ECOC) model Cross-validate multiclass error-correcting output codes (ECOC) model Discard support vectors of linear SVM binary learners in ECOC model Classification edge for multiclass error-correcting output codes (ECOC) model Classification loss for multiclass error-correcting output codes (ECOC) model Classification margins for multiclass error-correcting output codes (ECOC) model Classify observations using multiclass error-correcting output codes (ECOC) model Resubstitution classification edge for multiclass error-correcting output codes (ECOC) model Resubstitution classification loss for multiclass error-correcting output codes (ECOC) model Resubstitution classification margins for multiclass error-correcting output codes (ECOC) model Classify observations in multiclass error-correcting output codes (ECOC) model

Examples Train Multiclass Model Using SVM Learners Train a multiclass error-correcting output codes (ECOC) model using support vector machine (SVM) binary learners. 32-328

ClassificationECOC

Load Fisher's iris data set. Specify the predictor data X and the response data Y. load fisheriris X = meas; Y = species;

Train a multiclass ECOC model using the default options. Mdl = fitcecoc(X,Y) Mdl = ClassificationECOC ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: BinaryLearners: CodingName:

'Y' [] {'setosa' 'versicolor' 'none' {3x1 cell} 'onevsone'

'virginica'}

Properties, Methods

Mdl is a ClassificationECOC model. By default, fitcecoc uses SVM binary learners and a oneversus-one coding design. You can access Mdl properties using dot notation. Display the class names and the coding design matrix. Mdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } CodingMat = Mdl.CodingMatrix CodingMat = 3×3 1 -1 0

1 0 -1

0 1 -1

A one-versus-one coding design for three classes yields three binary learners. The columns of CodingMat correspond to the learners, and the rows correspond to the classes. The class order is the same as the order in Mdl.ClassNames. For example, CodingMat(:,1) is [1; –1; 0] and indicates that the software trains the first SVM binary learner using all observations classified as 'setosa' and 'versicolor'. Because 'setosa' corresponds to 1, it is the positive class; 'versicolor' corresponds to –1, so it is the negative class. You can access each binary learner using cell indexing and dot notation. Mdl.BinaryLearners{1}

% The first binary learner

ans = classreg.learning.classif.CompactClassificationSVM ResponseName: 'Y'

32-329

32

Functions

CategoricalPredictors: ClassNames: ScoreTransform: Beta: Bias: KernelParameters:

[] [-1 1] 'none' [4x1 double] 1.4505 [1x1 struct]

Properties, Methods

Compute the resubstitution classification error. error = resubLoss(Mdl) error = 0.0067

The classification error on the training data is small, but the classifier might be an overfitted model. You can cross-validate the classifier using crossval and compute the cross-validation classification error instead.

Inspect Binary Learner Properties of ECOC Classifier Train an ECOC classifier using SVM binary learners. Then, access properties of the binary learners, such as estimated parameters, by using dot notation. Load Fisher's iris data set. Specify the petal dimensions as the predictors and the species names as the response. load fisheriris X = meas(:,3:4); Y = species;

Train an ECOC classifier using SVM binary learners and the default coding design (one-versus-one). Standardize the predictors and save the support vectors. t = templateSVM('Standardize',true,'SaveSupportVectors',true); predictorNames = {'petalLength','petalWidth'}; responseName = 'irisSpecies'; classNames = {'setosa','versicolor','virginica'}; % Specify class order Mdl = fitcecoc(X,Y,'Learners',t,'ResponseName',responseName,... 'PredictorNames',predictorNames,'ClassNames',classNames) Mdl = ClassificationECOC PredictorNames: ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: BinaryLearners: CodingName: Properties, Methods

32-330

{'petalLength' 'petalWidth'} 'irisSpecies' [] {'setosa' 'versicolor' 'virginica'} 'none' {3x1 cell} 'onevsone'

ClassificationECOC

t is a template object that contains options for SVM classification. The function fitcecoc uses default values for the empty ([]) properties. Mdl is a ClassificationECOC classifier. You can access properties of Mdl using dot notation. Display the class names and the coding design matrix. Mdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } Mdl.CodingMatrix ans = 3×3 1 -1 0

1 0 -1

0 1 -1

The columns correspond to SVM binary learners, and the rows correspond to the distinct classes. The row order is the same as the order in the ClassNames property of Mdl. For each column: • 1 indicates that fitcecoc trains the SVM using observations in the corresponding class as members of the positive group. • –1 indicates that fitcecoc trains the SVM using observations in the corresponding class as members of the negative group. • 0 indicates that the SVM does not use observations in the corresponding class. In the first SVM, for example, fitcecoc assigns all observations to 'setosa' or 'versicolor', but not 'virginica'. Access properties of the SVMs using cell subscripting and dot notation. Store the standardized support vectors of each SVM. Unstandardize the support vectors. L = size(Mdl.CodingMatrix,2); % Number of SVMs sv = cell(L,1); % Preallocate for support vector indices for j = 1:L SVM = Mdl.BinaryLearners{j}; sv{j} = SVM.SupportVectors; sv{j} = sv{j}.*SVM.Sigma + SVM.Mu; end

sv is a cell array of matrices containing the unstandardized support vectors for the SVMs. Plot the data, and identify the support vectors. figure gscatter(X(:,1),X(:,2),Y); hold on markers = {'ko','ro','bo'}; % Should be of length L for j = 1:L svs = sv{j}; plot(svs(:,1),svs(:,2),markers{j},...

32-331

32

Functions

'MarkerSize',10 + (j - 1)*3); end title('Fisher''s Iris -- ECOC Support Vectors') xlabel(predictorNames{1}) ylabel(predictorNames{2}) legend([classNames,{'Support vectors - SVM 1',... 'Support vectors - SVM 2','Support vectors - SVM 3'}],... 'Location','Best') hold off

You can pass Mdl to these functions: • predict, to classify new observations • resubLoss, to estimate the classification error on the training data • crossval, to perform 10-fold cross-validation

Cross-Validate ECOC Classifier Cross-validate an ECOC classifier with SVM binary learners, and estimate the generalized classification error. Load Fisher's iris data set. Specify the predictor data X and the response data Y. load fisheriris X = meas;

32-332

ClassificationECOC

Y = species; rng(1); % For reproducibility

Create an SVM template, and standardize the predictors. t = templateSVM('Standardize',true) t = Fit template for classification SVM. Alpha: BoxConstraint: CacheSize: CachingMethod: ClipAlphas: DeltaGradientTolerance: Epsilon: GapTolerance: KKTTolerance: IterationLimit: KernelFunction: KernelScale: KernelOffset: KernelPolynomialOrder: NumPrint: Nu: OutlierFraction: RemoveDuplicates: ShrinkagePeriod: Solver: StandardizeData: SaveSupportVectors: VerbosityLevel: Version: Method: Type:

[0x1 double] [] [] '' [] [] [] [] [] [] '' [] [] [] [] [] [] [] [] '' 1 [] [] 2 'SVM' 'classification'

t is an SVM template. Most of the template object properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values. Train the ECOC classifier, and specify the class order. Mdl = fitcecoc(X,Y,'Learners',t,... 'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationECOC classifier. You can access its properties using dot notation. Cross-validate Mdl using 10-fold cross-validation. CVMdl = crossval(Mdl);

CVMdl is a ClassificationPartitionedECOC cross-validated ECOC classifier. Estimate the generalized classification error. genError = kfoldLoss(CVMdl) genError = 0.0400

32-333

32

Functions

The generalized classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.

More About Error-Correcting Output Codes Model An error-correcting output codes (ECOC) model reduces the problem of classification with three or more classes to a set of binary classification problems. ECOC classification requires a coding design, which determines the classes that the binary learners train on, and a decoding scheme, which determines how the results (predictions) of the binary classifiers are aggregated. Assume the following: • The classification problem has three classes. • The coding design is one-versus-one. For three classes, this coding design is Learner 1 Learner 2 Learner 3 Class 1 1 1 0 Class 2 −1 0 1 Class 3 0 −1 −1 • The decoding scheme uses loss g. • The learners are SVMs. To build this classification model, the ECOC algorithm follows these steps. 1

Learner 1 trains on observations in Class 1 or Class 2, and treats Class 1 as the positive class and Class 2 as the negative class. The other learners are trained similarly.

2

Let M be the coding design matrix with elements mkl, and sl be the predicted classification score for the positive class of learner l. The algorithm assigns a new observation to the class (k ) that minimizes the aggregation of the losses for the L binary learners.

k = argmin k



L

l=1

mkl g mkl, sl



L

l=1

.

mkl

ECOC models can improve classification accuracy, compared to other multiclass models [2]. Coding Design A coding design is a matrix where elements direct which classes are trained by each binary learner, that is, how the multiclass problem is reduced to a series of binary problems. Each row of the coding design corresponds to a distinct class, and each column corresponds to a binary learner. In a ternary coding design, for a particular column (or binary learner): 32-334

ClassificationECOC

• A row containing 1 directs the binary learner to group all observations in the corresponding class into a positive class. • A row containing –1 directs the binary learner to group all observations in the corresponding class into a negative class. • A row containing 0 directs the binary learner to ignore all observations in the corresponding class. Coding design matrices with large, minimal, pairwise row distances based on the Hamming measure are optimal. For details on the pairwise row distance, see “Random Coding Design Matrices” on page 32-1532 and [4]. This table describes popular coding designs. Coding Design

Description

Number of Learners

one-versus-all (OVA)

For each binary learner, K one class is positive and the rest are negative. This design exhausts all combinations of positive class assignments.

2

one-versus-one (OVO)

For each binary learner, K(K – 1)/2 one class is positive, another is negative, and the rest are ignored. This design exhausts all combinations of class pair assignments.

1

binary complete

This design partitions 2K – 1 – 1 the classes into all binary combinations, and does not ignore any classes. That is, all class assignments are –1 and 1 with at least one positive class and one negative class in the assignment for each binary learner.

2K – 2

ternary complete

This design partitions the classes into all ternary combinations. That is, all class assignments are 0, –1, and 1 with at least one positive class and one negative class in the assignment for each binary learner.

(3K – 2K + 1 + 1)/2

Minimal Pairwise Row Distance

3K – 2

32-335

32

Functions

Coding Design

Description

Number of Learners

Minimal Pairwise Row Distance

ordinal

For the first binary K–1 learner, the first class is negative and the rest are positive. For the second binary learner, the first two classes are negative and the rest are positive, and so on.

dense random

For each binary learner, Random, but Variable the software randomly approximately 10 log2K assigns classes into positive or negative classes, with at least one of each type. For more details, see “Random Coding Design Matrices” on page 321532.

sparse random

For each binary learner, Random, but Variable the software randomly approximately 15 log2K assigns classes as positive or negative with probability 0.25 for each, and ignores classes with probability 0.5. For more details, see “Random Coding Design Matrices” on page 32-1532.

1

This plot compares the number of binary learners for the coding designs with increasing K.

32-336

ClassificationECOC

Algorithms Random Coding Design Matrices For a given number of classes K, the software generates random coding design matrices as follows. 1

The software generates one of these matrices: a

Dense random — The software assigns 1 or –1 with equal probability to each element of the K-by-Ld coding design matrix, where Ld ≈ 10log2K .

b

Sparse random — The software assigns 1 to each element of the K-by-Ls coding design matrix with probability 0.25, –1 with probability 0.25, and 0 with probability 0.5, where Ls ≈ 15log2K .

2

If a column does not contain at least one 1 and at least one –1, then the software removes that column.

3

For distinct columns u and v, if u = v or u = –v, then the software removes v from the coding design matrix.

The software randomly generates 10,000 matrices by default, and retains the matrix with the largest, minimal, pairwise row distance based on the Hamming measure ([4]) given by L

Δ(k1, k2) = 0.5



mk1l mk2l mk1l − mk2l ,

l=1

32-337

32

Functions

where mkjl is an element of coding design matrix j. Support Vector Storage By default and for efficiency, fitcecoc empties the Alpha, SupportVectorLabels, and SupportVectors properties for all linear SVM binary learners. fitcecoc lists Beta, rather than Alpha, in the model display. To store Alpha, SupportVectorLabels, and SupportVectors, pass a linear SVM template that specifies storing support vectors to fitcecoc. For example, enter: t = templateSVM('SaveSupportVectors',true) Mdl = fitcecoc(X,Y,'Learners',t);

You can remove the support vectors and related values by passing the resulting ClassificationECOC model to discardSupportVectors.

Alternative Functionality You can use these alternative algorithms to train a multiclass model: • Classification ensembles—see fitcensemble and ClassificationEnsemble • Classification trees—see fitctree and ClassificationTree • Discriminant analysis classifiers—see fitcdiscr and ClassificationDiscriminant • k-nearest neighbor classifiers—see fitcknn and ClassificationKNN • Naive Bayes classifiers—see fitcnb and ClassificationNaiveBayes

References [1] Fürnkranz, Johannes. “Round Robin Classification.” Journal of Machine Learning Research, Vol. 2, 2002, pp. 721–747. [2] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recognition Letters, Vol. 30, Issue 3, 2009, pp. 285–297.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict and update functions support code generation. • When you train an ECOC model by using fitcecoc, the following restrictions apply. • You cannot fit posterior probabilities by using the 'FitPosterior' name-value pair argument. • All binary learners must be either SVM classifiers or linear classification models. For the 'Learners' name-value pair argument, you can specify: • 'svm' or 'linear' 32-338

ClassificationECOC

• An SVM template object or a cell array of such objects (see templateSVM) • A linear classification model template object or a cell array of such objects (see templateLinear) • When you generate code using a coder configurer for predict and update, the following additional restrictions apply for binary learners. • If you use a cell array of SVM template objects, the value of 'Standardize' for SVM learners must be consistent. For example, if you specify 'Standardize',true for one SVM learner, you must specify the same value for all SVM learners. • If you use a cell array of SVM template objects, and you use one SVM learner with a linear kernel ('KernelFunction','linear') and another with a different type of kernel function, then you must specify 'SaveSupportVectors',true for the learner with a linear kernel. For details, see ClassificationECOCCoderConfigurer. For information on name-value pair arguments that you cannot modify when you retrain a model, see “Tips” on page 32-5748. • Code generation limitations for SVM classifiers and linear classification models also apply to ECOC classifiers, depending on the choice of binary learners. For more details, see “Code Generation” on page 32-700 of the CompactClassificationSVM class and “Code Generation” on page 32-390 of the ClassificationLinear class. For more information, see “Introduction to Code Generation” on page 31-2.

See Also ClassificationPartitionedECOC | CompactClassificationECOC | fitcecoc | fitcsvm Introduced in R2014b

32-339

32

Functions

ClassificationECOCCoderConfigurer Coder configurer for multiclass model using binary learners

Description A ClassificationECOCCoderConfigurer object is a coder configurer of a multiclass errorcorrecting output codes (ECOC) classification model (ClassificationECOC or CompactClassificationECOC) that uses support vector machine (SVM) or linear binary learners. A coder configurer offers convenient features to configure code generation options, generate C/C++ code, and update model parameters in the generated code. • Configure code generation options and specify the coder attributes of model parameters by using object properties. • Generate C/C++ code for the predict and update functions of the ECOC model by using generateCode. Generating C/C++ code requires MATLAB Coder. • Update model parameters in the generated C/C++ code without having to regenerate the code. This feature reduces the effort required to regenerate, redeploy, and reverify C/C++ code when you retrain the model with new data or settings. Before updating model parameters, use validatedUpdateInputs to validate and extract the model parameters to update. This flow chart shows the code generation workflow using a coder configurer.

For the code generation usage notes and limitations of a multiclass ECOC classification model, see the Code Generation sections of CompactClassificationECOC, predict, and update.

Creation After training a multiclass ECOC classification model with SVM or linear binary learners by using fitcecoc, create a coder configurer for the model by using learnerCoderConfigurer. Use the properties of a coder configurer to specify the coder attributes of predict and update arguments. Then, use generateCode to generate C/C++ code based on the specified coder attributes.

32-340

ClassificationECOCCoderConfigurer

Properties predict Arguments The properties listed in this section specify the coder attributes of the predict function arguments in the generated code. X — Coder attributes of predictor data LearnerCoderInput object Coder attributes of predictor data to pass to the generated C/C++ code for the predict function of the ECOC classification model, specified as a LearnerCoderInput on page 32-355 object. When you create a coder configurer by using the learnerCoderConfigurer function, the input argument X determines the default values of the LearnerCoderInput coder attributes: • SizeVector — The default value is the array size of the input X. • If the Value attribute of the ObservationsIn property for the ClassificationECOCCoderConfigurer is 'rows', then this SizeVector value is [n p], where n corresponds to the number of observations and p corresponds to the number of predictors. • If the Value attribute of the ObservationsIn property for the ClassificationECOCCoderConfigurer is 'columns', then this SizeVector value is [p n]. To switch the elements of SizeVector (for example, to change [n p] to [p n]), modify the Value attribute of the ObservationsIn property for the ClassificationECOCCoderConfigurer accordingly. You cannot modify the SizeVector value directly. • VariableDimensions — The default value is [0 0], which indicates that the array size is fixed as specified in SizeVector. You can set this value to [1 0] if the SizeVector value is [n p] or to [0 1] if it is [p n], which indicates that the array has variable-size rows and fixed-size columns. For example, [1 0] specifies that the first value of SizeVector (n) is the upper bound for the number of rows, and the second value of SizeVector (p) is the number of columns. • DataType — This value is single or double. The default data type depends on the data type of the input X. • Tunability — This value must be true, meaning that predict in the generated C/C++ code always includes predictor data as an input. You can modify the coder attributes by using dot notation. For example, to generate C/C++ code that accepts predictor data with 100 observations (in rows) of three predictor variables (in columns), specify these coder attributes of X for the coder configurer configurer: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [0 0];

[0 0] indicates that the first and second dimensions of X (number of observations and number of predictor variables, respectively) have fixed sizes. 32-341

32

Functions

To allow the generated C/C++ code to accept predictor data with up to 100 observations, specify these coder attributes of X: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [1 0];

[1 0] indicates that the first dimension of X (number of observations) has a variable size and the second dimension of X (number of predictor variables) has a fixed size. The specified number of observations, 100 in this example, becomes the maximum allowed number of observations in the generated C/C++ code. To allow any number of observations, specify the bound as Inf. BinaryLoss — Coder attributes of binary learner loss function EnumeratedInput object Coder attributes of the binary learner loss function ('BinaryLoss' name-value pair argument of predict), specified as an EnumeratedInput on page 32-356 object. The default attribute values of the EnumeratedInput object are based on the default values of the predict function: • Value — Binary learner loss function, specified as one of the character vectors in BuiltInOptions or a character vector designating a custom function name. If the binary learners are SVMs or linear classification models of SVM learners, the default value is 'hinge'. If the binary learners are linear classification models of logistic regression learners, the default value is 'quadratic'. To use a custom option, define a custom function on the MATLAB search path, and specify Value as the name of the custom function. • SelectedOption — This value is 'Built-in'(default) or 'Custom'. The software sets SelectedOption according to Value. This attribute is read-only. • BuiltInOptions — Cell array of 'hamming', 'linear', 'quadratic', 'exponential', 'binodeviance', 'hinge', and 'logit'. This attribute is read-only. • IsConstant — This value must be true. • Tunability — The default value is false. If you specify other attribute values when Tunability is false, the software sets Tunability to true. Decoding — Coder attributes of decoding scheme EnumeratedInput object Coder attributes of the decoding scheme ('Decoding' name-value pair argument of predict), specified as an EnumeratedInput on page 32-356 object. The default attribute values of the EnumeratedInput object are based on the default values of the predict function: • Value — Decoding scheme value, specified as 'lossweighted'(default), 'lossbased', or a LearnerCoderInput on page 32-355 object. If you set IsConstant to false, then the software changes Value to a LearnerCoderInput on page 32-355 object with these read-only coder attribute values: • SizeVector — [1 12] 32-342

ClassificationECOCCoderConfigurer

• VariableDimensions — [0 1] • DataType — 'char' • Tunability — 1 The input in the generated code is a variable-size, tunable character vector that is either 'lossweighted' or 'lossbased'. • SelectedOption — This value is 'Built-in'(default) or 'NonConstant'. The software sets SelectedOption according to Value. This attribute is read-only. • BuiltInOptions — Cell array of 'lossweighted' and 'lossbased'. This attribute is readonly. • IsConstant — The default value is true. If you set this value to false, the software changes Value to a LearnerCoderInput object. • Tunability — The default value is false. If you specify other attribute values when Tunability is false, the software sets Tunability to true. ObservationsIn — Coder attributes of predictor data observation dimension EnumeratedInput object Coder attributes of the predictor data observation dimension ('ObservationsIn' name-value pair argument of predict), specified as an EnumeratedInput on page 32-356 object. When you create a coder configurer by using the learnerCoderConfigurer function, the 'ObservationsIn' name-value pair argument determines the default values of the EnumeratedInput coder attributes: • Value — The default value is the predictor data observation dimension you use when creating the coder configurer, specified as 'rows' or 'columns'. If you do not specify 'ObservationsIn' when creating the coder configurer, the default value is 'rows'. This value must be 'rows' for a model that uses SVM binary learners. • SelectedOption — This value is always 'Built-in'. This attribute is read-only. • BuiltInOptions — Cell array of 'rows' and 'columns'. This attribute is read-only. • IsConstant — This value must be true. • Tunability — The default value is false if you specify 'ObservationsIn','rows' when creating the coder configurer, and true if you specify 'ObservationsIn','columns'. If you set Tunability to false, the software sets Value to 'rows'. If you specify other attribute values when Tunability is false, the software sets Tunability to true. NumOutputs — Number of outputs in predict 1 (default) | 2 | 3 Number of output arguments to return from the generated C/C++ code for the predict function of the ECOC classification model, specified as 1, 2, or 3. The output arguments of predict are, in order: label (predicted class labels), NegLoss (negated average binary losses), and PBScore (positive-class scores). predict in the generated C/C++ code returns the first n outputs of the predict function, where n is the NumOutputs value. After creating the coder configurer configurer, you can specify the number of outputs by using dot notation. 32-343

32

Functions

configurer.NumOutputs = 2;

The NumOutputs property is equivalent to the '-nargout' compiler option of codegen. This option specifies the number of output arguments in the entry-point function of code generation. The object function generateCode generates two entry-point functions—predict.m and update.m for the predict and update functions of an ECOC classification model, respectively—and generates C/C++ code for the two entry-point functions. The specified value for the NumOutputs property corresponds to the number of output arguments in the entry-point function predict.m. Data Types: double update Arguments The properties listed in this section specify the coder attributes of the update function arguments in the generated code. The update function takes a trained model and new model parameters as input arguments, and returns an updated version of the model that contains the new parameters. To enable updating the parameters in the generated code, you need to specify the coder attributes of the parameters before generating code. Use a LearnerCoderInput on page 32-355 object to specify the coder attributes of each parameter. The default attribute values are based on the model parameters in the input argument Mdl of learnerCoderConfigurer. BinaryLearners — Coder attributes of trained binary learners ClassificationSVMCoderConfigurer object | ClassificationLinearCoderConfigurer object Coder attributes of the trained binary learners (BinaryLearners of an ECOC classification model), specified as a ClassificationSVMCoderConfigurer object (for SVM binary learners) or a ClassificationLinearCoderConfigurer object (for linear binary learners). Use the update arguments of the SVM or linear coder configurer object to specify the coder attributes of all binary learners. • For ClassificationSVMCoderConfigurer, the update arguments are Alpha, Beta, Bias, Cost, Mu, Prior, Scale, Sigma, SupportVectorLabels, and SupportVectors. • For ClassificationLinearCoderConfigurer, the update arguments are Beta, Bias, Cost, and Prior. For the configuration of BinaryLearners, the software uses only the update argument properties and ignores the other properties of the object. When you train an ECOC model with SVM binary learners, each learner can have a different number of support vectors. Therefore, the software configures the default attribute values of the LearnerCoderInput on page 32-355 objects for Alpha, SupportVectorLabels, and SupportVectors to accommodate all binary learners, based on the input argument Mdl of learnerCoderConfigurer. • SizeVector • This value is [s 1] for Alpha and SupportVectorLabels, where s is the largest number of support vectors in the binary learners. • This value is [s p] for SupportVectors, where p is the number of predictors. • VariableDimensions — This value is [0 0] or [1 0]. If each learner has the same number of support vectors, the default value is [0 0]. Otherwise, this value must be [1 0]. 32-344

ClassificationECOCCoderConfigurer

• [0 0] indicates that the array size is fixed as specified in SizeVector. • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train a model with a linear kernel function, and the model stores the linear predictor coefficients (Beta) without the support vectors and related values, then this value must be false. Otherwise, this value must be true. For details about the other update arguments, see update arguments on page 32-495 of ClassificationSVMCoderConfigurer and update arguments on page 32-393 of ClassificationLinearCoderConfigurer. Cost — Coder attributes of misclassification cost LearnerCoderInput object Coder attributes of the misclassification cost (Cost of an ECOC classification model), specified as a LearnerCoderInput on page 32-355 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [c c], where c is the number of classes. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — The default value is true. Prior — Coder attributes of prior probabilities LearnerCoderInput object Coder attributes of the prior probabilities (Prior of an ECOC classification model), specified as a LearnerCoderInput on page 32-355 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [1 c], where c is the number of classes. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — The default value is true. Other Configurer Options OutputFileName — File name of generated C/C++ code 'ClassificationECOCModel' (default) | character vector File name of the generated C/C++ code, specified as a character vector. 32-345

32

Functions

The object function generateCode of ClassificationECOCCoderConfigurer generates C/C++ code using this file name. The file name must not contain spaces because they can lead to code generation failures in certain operating system configurations. Also, the name must be a valid MATLAB function name. After creating the coder configurer configurer, you can specify the file name by using dot notation. configurer.OutputFileName = 'myModel';

Data Types: char Verbose — Verbosity level true (logical 1) (default) | false (logical 0) Verbosity level, specified as true (logical 1) or false (logical 0). The verbosity level controls the display of notification messages at the command line. Value

Description

true (logical 1)

The software displays notification messages when your changes to the coder attributes of a parameter result in changes for other dependent parameters.

false (logical 0)

The software does not display notification messages.

To enable updating machine learning model parameters in the generated code, you need to configure the coder attributes of the parameters before generating code. The coder attributes of parameters are dependent on each other, so the software stores the dependencies as configuration constraints. If you modify the coder attributes of a parameter by using a coder configurer, and the modification requires subsequent changes to other dependent parameters to satisfy configuration constraints, then the software changes the coder attributes of the dependent parameters. The verbosity level determines whether or not the software displays notification messages for these subsequent changes. After creating the coder configurer configurer, you can modify the verbosity level by using dot notation. configurer.Verbose = false;

Data Types: logical Options for Code Generation Customization To customize the code generation workflow, use the generateFiles function and the following three properties with codegen, instead of using the generateCode function. After generating the two entry-point function files (predict.m and update.m) by using the generateFiles function, you can modify these files according to your code generation workflow. For example, you can modify the predict.m file to include data preprocessing, or you can add these entry-point functions to another code generation project. Then, you can generate C/C++ code by using the codegen function and the codegen arguments appropriate for the modified entry-point functions or code generation project. Use the three properties described in this section as a starting point to set the codegen arguments. CodeGenerationArguments — codegen arguments cell array 32-346

ClassificationECOCCoderConfigurer

This property is read-only. codegen arguments, specified as a cell array. This property enables you to customize the code generation workflow. Use the generateCode function if you do not need to customize your workflow. Instead of using generateCode with the coder configurer configurer, you can generate C/C++ code as follows: generateFiles(configurer) cgArgs = configurer.CodeGenerationArguments; codegen(cgArgs{:})

If you customize the code generation workflow, modify cgArgs accordingly before calling codegen. If you modify other properties of configurer, the software updates the CodeGenerationArguments property accordingly. Data Types: cell PredictInputs — List of tunable input arguments of predict cell array This property is read-only. List of tunable input arguments of the entry-point function predict.m for code generation, specified as a cell array. The cell array contains another cell array that includes coder.PrimitiveType objects and coder.Constant objects. If you modify the coder attributes of predict arguments on page 32-341, then the software updates the corresponding objects accordingly. If you specify the Tunability attribute as false, then the software removes the corresponding objects from the PredictInputs list. The cell array in PredictInputs is equivalent to configurer.CodeGenerationArguments{6} for the coder configurer configurer. Data Types: cell UpdateInputs — List of tunable input arguments of update cell array This property is read-only. List of the tunable input arguments of the entry-point function update.m for code generation, specified as a cell array of a structure. The structure includes a coder.CellType object for BinaryLearners and coder.PrimitiveType objects for Cost and Prior. If you modify the coder attributes of update arguments on page 32-344, then the software updates the corresponding objects accordingly. If you specify the Tunability attribute as false, then the software removes the corresponding object from the UpdateInputs list. The structure in UpdateInputs is equivalent to configurer.CodeGenerationArguments{3} for the coder configurer configurer. Data Types: cell 32-347

32

Functions

Object Functions generateCode generateFiles validatedUpdateInputs

Generate C/C++ code using coder configurer Generate MATLAB files for code generation using coder configurer Validate and extract machine learning model parameters to update

Examples Generate Code Using Coder Configurer Train a machine learning model, and then generate code for the predict and update functions of the model by using a coder configurer. Load Fisher's iris data set and train a multiclass ECOC model using SVM binary learners. load fisheriris X = meas; Y = species; Mdl = fitcecoc(X,Y);

Mdl is a ClassificationECOC object. Create a coder configurer for the ClassificationECOC model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input. configurer = learnerCoderConfigurer(Mdl,X) configurer = ClassificationECOCCoderConfigurer with properties: Update Inputs: BinaryLearners: [1x1 ClassificationSVMCoderConfigurer] Prior: [1x1 LearnerCoderInput] Cost: [1x1 LearnerCoderInput] Predict Inputs: X: [1x1 LearnerCoderInput] Code Generation Parameters: NumOutputs: 1 OutputFileName: 'ClassificationECOCModel' Properties, Methods

configurer is a ClassificationECOCCoderConfigurer object, which is a coder configurer of a ClassificationECOC object. To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Generate code for the predict and update functions of the ECOC classification model (Mdl) with default settings. 32-348

ClassificationECOCCoderConfigurer

generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationECOCModel.mat'

The generateCode function completes these actions: • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named ClassificationECOCModel for the two entry-point functions. • Create the code for the MEX function in the codegen\mex\ClassificationECOCModel folder. • Copy the MEX function to the current folder. Display the contents of the predict.m, update.m, and initialize.m files by using the type function. type predict.m function varargout = predict(X,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:36:34 [varargout{1:nargout}] = initialize('predict',X,varargin{:}); end type update.m function update(varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:36:34 initialize('update',varargin{:}); end type initialize.m function [varargout] = initialize(command,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:36:34 coder.inline('always') persistent model if isempty(model) model = loadLearnerForCoder('ClassificationECOCModel.mat'); end switch(command) case 'update' % Update struct fields: BinaryLearners % Prior % Cost model = update(model,varargin{:}); case 'predict' % Predict Inputs: X X = varargin{1}; if nargin == 2 [varargout{1:nargout}] = predict(model,X); else PVPairs = cell(1,nargin-2); for i = 1:nargin-2 PVPairs{1,i} = varargin{i+1}; end [varargout{1:nargout}] = predict(model,X,PVPairs{:}); end

32-349

32

Functions

end end

Update Parameters of ECOC Classification Model in Generated Code Train an error-correcting output codes (ECOC) model using SVM binary learners and create a coder configurer for the model. Use the properties of the coder configurer to specify coder attributes of the ECOC model parameters. Use the object function of the coder configurer to generate C code that predicts labels for new predictor data. Then retrain the model using different settings, and update parameters in the generated code without regenerating the code. Train Model Load Fisher's iris data set. load fisheriris X = meas; Y = species;

Create an SVM binary learner template to use a Gaussian kernel function and to standardize predictor data. t = templateSVM('KernelFunction','gaussian','Standardize',true);

Train a multiclass ECOC model using the template t. Mdl = fitcecoc(X,Y,'Learners',t);

Mdl is a ClassificationECOC object. Create Coder Configurer Create a coder configurer for the ClassificationECOC model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input. Also, set the number of outputs to 2 so that the generated code returns the first two outputs of the predict function, which are the predicted labels and negated average binary losses. configurer = learnerCoderConfigurer(Mdl,X,'NumOutputs',2) configurer = ClassificationECOCCoderConfigurer with properties: Update Inputs: BinaryLearners: [1x1 ClassificationSVMCoderConfigurer] Prior: [1x1 LearnerCoderInput] Cost: [1x1 LearnerCoderInput] Predict Inputs: X: [1x1 LearnerCoderInput] Code Generation Parameters: NumOutputs: 2 OutputFileName: 'ClassificationECOCModel'

32-350

ClassificationECOCCoderConfigurer

Properties, Methods

configurer is a ClassificationECOCCoderConfigurer object, which is a coder configurer of a ClassificationECOC object. The display shows the tunable input arguments of predict and update: X, BinaryLearners, Prior, and Cost. Specify Coder Attributes of Parameters Specify the coder attributes of predict arguments (predictor data and the name-value pair arguments 'Decoding' and 'BinaryLoss') and update arguments (support vectors of the SVM learners) so that you can use these arguments as the input arguments of predict and update in the generated code. First, specify the coder attributes of X so that the generated code accepts any number of observations. Modify the SizeVector and VariableDimensions attributes. The SizeVector attribute specifies the upper bound of the predictor data size, and the VariableDimensions attribute specifies whether each dimension of the predictor data has a variable size or fixed size. configurer.X.SizeVector = [Inf 4]; configurer.X.VariableDimensions = [true false];

The size of the first dimension is the number of observations. In this case, the code specifies that the upper bound of the size is Inf and the size is variable, meaning that X can have any number of observations. This specification is convenient if you do not know the number of observations when generating code. The size of the second dimension is the number of predictor variables. This value must be fixed for a machine learning model. X contains 4 predictors, so the second value of the SizeVector attribute must be 4 and the second value of the VariableDimensions attribute must be false. Next, modify the coder attributes of BinaryLoss and Decoding to use the 'BinaryLoss' and 'Decoding' name-value pair arguments in the generated code. Display the coder attributes of BinaryLoss. configurer.BinaryLoss ans = EnumeratedInput with properties: Value: SelectedOption: BuiltInOptions: IsConstant: Tunability:

'hinge' 'Built-in' {1x7 cell} 1 0

To use a nondefault value in the generated code, you must specify the value before generating the code. Specify the Value attribute of BinaryLoss as 'exponential'. configurer.BinaryLoss.Value = 'exponential'; configurer.BinaryLoss ans = EnumeratedInput with properties:

32-351

32

Functions

Value: SelectedOption: BuiltInOptions: IsConstant: Tunability:

'exponential' 'Built-in' {1x7 cell} 1 1

If you modify attribute values when Tunability is false (logical 0), the software sets the Tunability to true (logical 1). Display the coder attributes of Decoding. configurer.Decoding ans = EnumeratedInput with properties: Value: SelectedOption: BuiltInOptions: IsConstant: Tunability:

'lossweighted' 'Built-in' {'lossweighted' 1 0

'lossbased'}

Specify the IsConstant attribute of Decoding as false so that you can use all available values in BuiltInOptions in the generated code. configurer.Decoding.IsConstant = false; configurer.Decoding ans = EnumeratedInput with properties: Value: SelectedOption: BuiltInOptions: IsConstant: Tunability:

[1x1 LearnerCoderInput] 'NonConstant' {'lossweighted' 'lossbased'} 0 1

The software changes the Value attribute of Decoding to a LearnerCoderInput object so that you can use both 'lossweighted' and 'lossbased' as the value of 'Decoding'. Also, the software sets the SelectedOption to 'NonConstant' and the Tunability to true. Finally, modify the coder attributes of SupportVectors in BinaryLearners. Display the coder attributes of SupportVectors. configurer.BinaryLearners.SupportVectors ans = LearnerCoderInput with properties: SizeVector: VariableDimensions: DataType: Tunability:

32-352

[54 4] [1 0] 'double' 1

ClassificationECOCCoderConfigurer

The default value of VariableDimensions is [true false] because each learner has a different number of support vectors. If you retrain the ECOC model using new data or different settings, the number of support vectors in the SVM learners can vary. Therefore, increase the upper bound of the number of support vectors. configurer.BinaryLearners.SupportVectors.SizeVector = [150 4];

SizeVector attribute for Alpha has been modified to satisfy configuration constraints. SizeVector attribute for SupportVectorLabels has been modified to satisfy configuration constrain

If you modify the coder attributes of SupportVectors, then the software modifies the coder attributes of Alpha and SupportVectorLabels to satisfy configuration constraints. If the modification of the coder attributes of one parameter requires subsequent changes to other dependent parameters to satisfy configuration constraints, then the software changes the coder attributes of the dependent parameters. Display the coder configurer. configurer configurer = ClassificationECOCCoderConfigurer with properties: Update Inputs: BinaryLearners: [1x1 ClassificationSVMCoderConfigurer] Prior: [1x1 LearnerCoderInput] Cost: [1x1 LearnerCoderInput] Predict Inputs: X: [1x1 LearnerCoderInput] BinaryLoss: [1x1 EnumeratedInput] Decoding: [1x1 EnumeratedInput] Code Generation Parameters: NumOutputs: 2 OutputFileName: 'ClassificationECOCModel' Properties, Methods

The display now includes BinaryLoss and Decoding as well. Generate Code To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Generate code for the predict and update functions of the ECOC classification model (Mdl). generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationECOCModel.mat'

The generateCode function completes these actions: 32-353

32

Functions

• Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named ClassificationECOCModel for the two entry-point functions. • Create the code for the MEX function in the codegen\mex\ClassificationECOCModel folder. • Copy the MEX function to the current folder. Verify Generated Code Pass some predictor data to verify whether the predict function of Mdl and the predict function in the MEX function return the same labels. To call an entry-point function in a MEX function that has more than one entry point, specify the function name as the first input argument. Because you specified 'Decoding' as a tunable input argument by changing the IsConstant attribute before generating the code, you also need to specify it in the call to the MEX function, even though 'lossweighted' is the default value of 'Decoding'.

[label,NegLoss] = predict(Mdl,X,'BinaryLoss','exponential'); [label_mex,NegLoss_mex] = ClassificationECOCModel('predict',X,'BinaryLoss','exponential','Decodin

Compare label to label_mex by using isequal. isequal(label,label_mex) ans = logical 1

isequal returns logical 1 (true) if all the inputs are equal. The comparison confirms that the predict function of Mdl and the predict function in the MEX function return the same labels. NegLoss_mex might include round-off differences compared to NegLoss. In this case, compare NegLoss_mex to NegLoss, allowing a small tolerance. find(abs(NegLoss-NegLoss_mex) > 1e-8) ans = 0x1 empty double column vector

The comparison confirms that NegLoss and NegLoss_mex are equal within the tolerance 1e–8. Retrain Model and Update Parameters in Generated Code Retrain the model using a different setting. Specify 'KernelScale' as 'auto' so that the software selects an appropriate scale factor using a heuristic procedure. t_new = templateSVM('KernelFunction','gaussian','Standardize',true,'KernelScale','auto'); retrainedMdl = fitcecoc(X,Y,'Learners',t_new);

Extract parameters to update by using validatedUpdateInputs. This function detects the modified model parameters in retrainedMdl and validates whether the modified parameter values satisfy the coder attributes of the parameters. params = validatedUpdateInputs(configurer,retrainedMdl);

Update parameters in the generated code. ClassificationECOCModel('update',params)

32-354

ClassificationECOCCoderConfigurer

Verify Generated Code Compare the outputs from the predict function of retrainedMdl to the outputs from the predict function in the updated MEX function.

[label,NegLoss] = predict(retrainedMdl,X,'BinaryLoss','exponential','Decoding','lossbased'); [label_mex,NegLoss_mex] = ClassificationECOCModel('predict',X,'BinaryLoss','exponential','Decodin isequal(label,label_mex) ans = logical 1 find(abs(NegLoss-NegLoss_mex) > 1e-8) ans = 0x1 empty double column vector

The comparison confirms that label and label_mex are equal, and NegLoss and NegLoss_mex are equal within the tolerance.

More About LearnerCoderInput Object A coder configurer uses a LearnerCoderInput object to specify the coder attributes of predict and update input arguments. A LearnerCoderInput object has the following attributes to specify the properties of an input argument array in the generated code. Attribute Name

Description

SizeVector

Array size if the corresponding VariableDimensions value is false. Upper bound of the array size if the corresponding VariableDimensions value is true. To allow an unbounded array, specify the bound as Inf.

VariableDimensions

Indicator specifying whether each dimension of the array has a variable size or fixed size, specified as true (logical 1) or false (logical 0): • A value of true (logical 1) means that the corresponding dimension has a variable size. • A value of false (logical 0) means that the corresponding dimension has a fixed size.

DataType

Data type of the array

32-355

32

Functions

Attribute Name

Description

Tunability

Indicator specifying whether or not predict or update includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the coder attributes of the coefficients Alpha in BinaryLearners of the coder configurer configurer: configurer.BinaryLearners.Alpha.SizeVector = [100 1]; configurer.BinaryLearners.Alpha.VariableDimensions = [1 0]; configurer.BinaryLearners.Alpha.DataType = 'double';

If you specify the verbosity level (Verbose) as true (default), then the software displays notification messages when you modify the coder attributes of a machine learning model parameter and the modification changes the coder attributes of other dependent parameters. EnumeratedInput Object A coder configurer uses an EnumeratedInput object to specify the coder attributes of predict input arguments that have a finite set of available values. An EnumeratedInput object has the following attributes to specify the properties of an input argument array in the generated code.

32-356

ClassificationECOCCoderConfigurer

Attribute Name

Description

Value

Value of the predict argument in the generated code, specified as a character vector or a LearnerCoderInput on page 32-355 object. • Character vector in BuiltInOptions — You can specify one of the BuiltInOptions using either the option name or its index value. For example, to choose the first option, specify Value as either the first character vector in BuiltInOptions or 1. • Character vector designating a custom function name — To use a custom option, define a custom function on the MATLAB search path, and specify Value as the name of the custom function. • LearnerCoderInput on page 32-355 object — If you set IsConstant to false (logical 0), then the software changes Value to a LearnerCoderInput on page 32-355 object with the following read-only coder attribute values. These values indicate that the input in the generated code is a variable-size, tunable character vector that is one of the available values in BuiltInOptions. • SizeVector — [1 c], indicating the upper bound of the array size, where c is the length of the longest available character vector in Option • VariableDimensions — [0 1], indicating that the array is a variable-size vector • DataType — 'char' • Tunability — 1 The default value of Value is consistent with the default value of the corresponding predict argument, which is one of the character vectors in BuiltInOptions.

SelectedOption

Status of the selected option, specified as 'Built-in', 'Custom', or 'NonConstant'. The software sets SelectedOption according to Value: • 'Built-in'(default) — When Value is one of the character vectors in BuiltInOptions • 'Custom' — When Value is a character vector that is not in BuiltInOptions • 'NonConstant' — When Value is a LearnerCoderInput on page 32-355 object This attribute is read-only.

BuiltInOptions

List of available character vectors for the corresponding predict argument, specified as a cell array. This attribute is read-only.

32-357

32

Functions

Attribute Name

Description

IsConstant

Indicator specifying whether or not the array value is a compiletime constant (coder.Constant) in the generated code, specified as true (logical 1, default) or false (logical 0). If you set this value to false, then the software changes Value to a LearnerCoderInput on page 32-355 object.

Tunability

Indicator specifying whether or not predict includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0, default). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the coder attributes of BinaryLoss of the coder configurer configurer: configurer.BinaryLoss.Value = 'linear';

See Also ClassificationECOC | ClassificationLinearCoderConfigurer | ClassificationSVMCoderConfigurer | CompactClassificationECOC | learnerCoderConfigurer | predict | update Topics “Introduction to Code Generation” on page 31-2 “Code Generation for Prediction and Update Using Coder Configurer” on page 31-68 Introduced in R2019a

32-358

ClassificationDiscriminant class

ClassificationDiscriminant class Superclasses: CompactClassificationDiscriminant Discriminant analysis classification

Description A ClassificationDiscriminant object encapsulates a discriminant analysis classifier, which is a Gaussian mixture model for data generation. A ClassificationDiscriminant object can predict responses for new data using the predict method. The object contains the data used for training, so can compute resubstitution predictions.

Construction Create a ClassificationDiscriminant object by using fitcdiscr.

Properties BetweenSigma p-by-p matrix, the between-class covariance, where p is the number of predictors. CategoricalPredictors Categorical predictor indices, which is always empty ([]) . ClassNames List of the elements in the training data Y with duplicates removed. ClassNames can be a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) Coeffs k-by-k structure of coefficient matrices, where k is the number of classes. Coeffs(i,j) contains coefficients of the linear or quadratic boundaries between classes i and j. Fields in Coeffs(i,j): • DiscrimType • Class1 — ClassNames(i) • Class2 — ClassNames(j) • Const — A scalar • Linear — A vector with p components, where p is the number of columns in X • Quadratic — p-by-p matrix, exists for quadratic DiscrimType The equation of the boundary between class i and class j is Const

+

Linear

*

x

+

x'

*

Quadratic

*

x

=

0,

32-359

32

Functions

where x is a column vector of length p. If fitcdiscr had the FillCoeffs name-value pair set to 'off' when constructing the classifier, Coeffs is empty ([]). Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. Change a Cost matrix using dot notation: obj.Cost = costMatrix. Delta Value of the Delta threshold for a linear discriminant model, a nonnegative scalar. If a coefficient of obj has magnitude smaller than Delta, obj sets this coefficient to 0, and so you can eliminate the corresponding predictor from the model. Set Delta to a higher value to eliminate more predictors. Delta must be 0 for quadratic discriminant models. Change Delta using dot notation: obj.Delta = newDelta. DeltaPredictor Row vector of length equal to the number of predictors in obj. If DeltaPredictor(i) < Delta then coefficient i of the model is 0. If obj is a quadratic discriminant model, all elements of DeltaPredictor are 0. DiscrimType Character vector specifying the discriminant type. One of: • 'linear' • 'quadratic' • 'diagLinear' • 'diagQuadratic' • 'pseudoLinear' • 'pseudoQuadratic' Change DiscrimType using dot notation: obj.DiscrimType = newDiscrimType. You can change between linear types, or between quadratic types, but cannot change between linear and quadratic types. Gamma Value of the Gamma regularization parameter, a scalar from 0 to 1. Change Gamma using dot notation: obj.Gamma = newGamma. • If you set 1 for linear discriminant, the discriminant sets its type to 'diagLinear'. 32-360

ClassificationDiscriminant class

• If you set a value between MinGamma and 1 for linear discriminant, the discriminant sets its type to 'linear'. • You cannot set values below the value of the MinGamma property. • For quadratic discriminant, you can set either 0 (for DiscrimType 'quadratic') or 1 (for DiscrimType 'diagQuadratic'). HyperparameterOptimizationResults Description of the cross-validation optimization of hyperparameters, stored as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation: • 'bayesopt' (default) — Object of class BayesianOptimization • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst) LogDetSigma Logarithm of the determinant of the within-class covariance matrix. The type of LogDetSigma depends on the discriminant type: • Scalar for linear discriminant analysis • Vector of length K for quadratic discriminant analysis, where K is the number of classes MinGamma Nonnegative scalar, the minimal value of the Gamma parameter so that the correlation matrix is invertible. If the correlation matrix is not singular, MinGamma is 0. ModelParameters Parameters used in training obj. Mu Class means, specified as a K-by-p matrix of scalar values class means of size. K is the number of classes, and p is the number of predictors. Each row of Mu represents the mean of the multivariate normal distribution of the corresponding class. The class indices are in the ClassNames attribute. NumObservations Number of observations in the training data, a numeric scalar. NumObservations can be less than the number of rows of input data X when there are missing values in X or response Y. PredictorNames Cell array of names for the predictor variables, in the order in which they appear in the training data X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. 32-361

32

Functions

Add or change a Prior vector using dot notation: obj.Prior = priorVector. ResponseName Character vector describing the response variable Y. ScoreTransform Character vector representing a built-in transformation function, or a function handle for transforming scores. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitcdiscr. Implement dot notation to add or change a ScoreTransform function using one of the following: • cobj.ScoreTransform = 'function' • cobj.ScoreTransform = @function Sigma Within-class covariance matrix or matrices. The dimensions depend on DiscrimType: • 'linear' (default) — Matrix of size p-by-p, where p is the number of predictors • 'quadratic' — Array of size p-by-p-by-K, where K is the number of classes • 'diagLinear' — Row vector of length p • 'diagQuadratic' — Array of size 1-by-p-by-K • 'pseudoLinear' — Matrix of size p-by-p • 'pseudoQuadratic' — Array of size p-by-p-by-K W Scaled weights, a vector with length n, the number of rows in X. X Matrix of predictor values. Each column of X represents one predictor (variable), and each row represents one observation. Xcentered X data with class means subtracted. If Y(i) is of class j, Xcentered(i,:)

=

X(i,:)



Mu(j,:),

where Mu is the class mean property. Y A categorical array, cell array of character vectors, character array, logical vector, or a numeric vector with the same number of rows as X. Each row of Y represents the classification of the corresponding row of X. 32-362

ClassificationDiscriminant class

Methods compact

Compact discriminant analysis classifier

crossval

Cross-validated discriminant analysis classifier

cvshrink

Cross-validate regularization of linear discriminant

resubEdge

Classification edge by resubstitution

resubLoss

Classification error by resubstitution

resubMargin

Classification margins by resubstitution

resubPredict

Predict resubstitution labels of discriminant analysis classification model

Inherited Methods edge

Classification edge

logP

Log unconditional probability density for discriminant analysis classifier

loss

Classification error

mahal

Mahalanobis distance to class means

margin

Classification margins

nLinearCoeffs

Number of nonzero linear coefficients

predict

Predict labels using discriminant analysis classification model

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples Train Discriminant Analysis Model Load Fisher's iris data set. load fisheriris

Train a discriminant analysis model using the entire data set. Mdl = fitcdiscr(meas,species) Mdl = ClassificationDiscriminant ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' ScoreTransform: 'none' NumObservations: 150 DiscrimType: 'linear' Mu: [3x4 double] Coeffs: [3x3 struct]

'virginica'}

32-363

32

Functions

Properties, Methods

Mdl is a ClassificationDiscriminant model. To access its properties, use dot notation. For example, display the group means for each predictor. Mdl.Mu ans = 3×4 5.0060 5.9360 6.5880

3.4280 2.7700 2.9740

1.4620 4.2600 5.5520

0.2460 1.3260 2.0260

To predict labels for new observations, pass Mdl and predictor data to predict.

More About Discriminant Classification The model for discriminant analysis is: • Each class (Y) generates data (X) using a multivariate normal distribution. That is, the model assumes X has a Gaussian mixture distribution (gmdistribution). • For linear discriminant analysis, the model has the same covariance matrix for each class, only the means vary. • For quadratic discriminant analysis, both means and covariances of each class vary. predict classifies so as to minimize the expected classification cost: y = argmin

K



y = 1, ..., K k = 1

P k xCy k,

where •

y is the predicted classification.

• K is the number of classes. • P k x is the posterior probability on page 20-6 of class k for observation x. • C y k is the cost on page 20-7 of classifying an observation as y when its true class is k. For details, see “Prediction Using Discriminant Analysis Models” on page 20-6. Regularization Regularization is the process of finding a small set of predictors that yield an effective predictive model. For linear discriminant analysis, there are two parameters, γ and δ, that control regularization as follows. cvshrink helps you select appropriate values of the parameters. Let Σ represent the covariance matrix of the data X, and let X be the centered data (the data X minus the mean by class). Define 32-364

ClassificationDiscriminant class

T

D = diag X * X . The regularized covariance matrix Σ is Σ = 1 − γ Σ + γD . Whenever γ ≥ MinGamma, Σ is nonsingular. Let μk be the mean vector for those elements of X in class k, and let μ0 be the global mean vector (the mean of the rows of X). Let C be the correlation matrix of the data X, and let C be the regularized correlation matrix: C = 1 − γ C + γI, where I is the identity matrix. The linear term in the regularized discriminant analysis classifier for a data point x is T −1

x − μ0 Σ

T

μk − μ0 = x − μ0 D−1/2 C

−1 −1/2

D

μk − μ0 .

The parameter δ enters into this equation as a threshold on the final term in square brackets. Each component of the vector C

−1 −1/2

D

μk − μ0 is set to zero if it is smaller in magnitude than the

threshold δ. Therefore, for class k, if component j is thresholded to zero, component j of x does not enter into the evaluation of the posterior probability. The DeltaPredictor property is a vector related to this threshold. When δ ≥ DeltaPredictor(i), all classes k have C

−1 −1/2

D

μk − μ0 ≤ δ .

Therefore, when δ ≥ DeltaPredictor(i), the regularized classifier does not use predictor i.

References [1] Guo, Y., T. Hastie, and R. Tibshirani. Regularized linear discriminant analysis and its application in microarrays. Biostatistics, Vol. 8, No. 1, pp. 86–100, 2007.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. • When you train a discriminant analysis model by using fitcdiscr or create a compact discriminant analysis model by using makecdiscr, the following restrictions apply. • The class labels input argument value (Y) cannot be a categorical array. 32-365

32

Functions

• The value of the 'ClassNames' name-value pair argument cannot be a categorical array. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For more information, see “Introduction to Code Generation” on page 31-2.

See Also CompactClassificationDiscriminant | compareHoldout | fitcdiscr Topics “Discriminant Analysis Classification” on page 20-2 Introduced in R2011b

32-366

ClassificationEnsemble

ClassificationEnsemble Package: classreg.learning.classif Superclasses: CompactClassificationEnsemble Ensemble classifier

Description ClassificationEnsemble combines a set of trained weak learner models and data on which these learners were trained. It can predict ensemble response for new data by aggregating predictions from its weak learners. It stores data used for training, can compute resubstitution predictions, and can resume training if desired.

Construction Create a classification ensemble object using fitcensemble.

Properties BinEdges Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the 'NumBins' name-value pair argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default). You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl. X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

32-367

32

Functions

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs. CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). ClassNames List of the elements in Y with duplicates removed. ClassNames can be a numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) CombineWeights Character vector describing how ens combines weak learner weights, either 'WeightedSum' or 'WeightedAverage'. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is readonly. ExpandedPredictorNames Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. FitInfo Numeric array of fit information. The FitInfoDescription property describes the content of this array. FitInfoDescription Character vector describing the meaning of the FitInfo array. HyperparameterOptimizationResults Description of the cross-validation optimization of hyperparameters, stored as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation: • 'bayesopt' (default) — Object of class BayesianOptimization 32-368

ClassificationEnsemble

• 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst) LearnerNames Cell array of character vectors with names of weak learners in the ensemble. The name of each learner appears just once. For example, if you have an ensemble of 100 trees, LearnerNames is {'Tree'}. Method Character vector describing the method that creates ens. ModelParameters Parameters used in training ens. NumObservations Numeric scalar containing the number of observations in the training data. NumTrained Number of trained weak learners in ens, a scalar. PredictorNames Cell array of names for the predictor variables, in the order in which they appear in X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only. ReasonForTermination Character vector describing the reason fitcensemble stopped adding weak learners to the ensemble. ResponseName Character vector with the name of the response variable Y. ScoreTransform Function handle for transforming scores, or character vector representing a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree. Add or change a ScoreTransform function using dot notation: ens.ScoreTransform = 'function'

or 32-369

32

Functions

ens.ScoreTransform = @function

Trained A cell vector of trained classification models. • If Method is 'LogitBoost' or 'GentleBoost', then ClassificationEnsemble stores trained learner j in the CompactRegressionLearner property of the object stored in Trained{j}. That is, to access trained learner j, use ens.Trained{j}.CompactRegressionLearner. • Otherwise, cells of the cell vector contain the corresponding, compact classification models. TrainedWeights Numeric vector of trained weights for the weak learners in ens. TrainedWeights has T elements, where T is the number of weak learners in learners. UsePredForLearner Logical matrix of size P-by-NumTrained, where P is the number of predictors (columns) in the training data X. UsePredForLearner(i,j) is true when learner j uses predictor i, and is false otherwise. For each learner, the predictors have the same order as the columns in the training data X. If the ensemble is not of type Subspace, all entries in UsePredForLearner are true. W Scaled weights, a vector with length n, the number of rows in X. The sum of the elements of W is 1. X Matrix or table of predictor values that trained the ensemble. Each column of X represents one variable, and each row represents one observation. Y Numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. Each row of Y represents the classification of the corresponding row of X.

Methods

32-370

compact

Compact classification ensemble

crossval

Cross validate ensemble

resubEdge

Classification edge by resubstitution

resubLoss

Classification error by resubstitution

resubMargin

Classification margins by resubstitution

resubPredict

Classify observations in ensemble of classification models

resume

Resume training ensemble

ClassificationEnsemble

Inherited Methods edge

Classification edge

loss

Classification error

margin

Classification margins

predict

Classify observations using ensemble of classification models

predictorImportance Estimates of predictor importance for classification ensemble of decision trees removeLearners

Remove members of compact classification ensemble

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples Train Boosted Classification Ensemble Load the ionosphere data set. load ionosphere

Train a boosted ensemble of 100 classification trees using all measurements and the AdaBoostM1 method. Mdl = fitcensemble(X,Y,'Method','AdaBoostM1')

Mdl = classreg.learning.classif.ClassificationEnsemble ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumObservations: 351 NumTrained: 100 Method: 'AdaBoostM1' LearnerNames: {'Tree'} ReasonForTermination: 'Terminated normally after completing the requested number of training FitInfo: [100x1 double] FitInfoDescription: {2x1 cell} Properties, Methods

Mdl is a ClassificationEnsemble model object. Mdl.Trained is the property that stores a 100-by-1 cell vector of the trained classification trees (CompactClassificationTree model objects) that compose the ensemble. Plot a graph of the first trained classification tree. view(Mdl.Trained{1},'Mode','graph')

32-371

32

Functions

By default, fitcensemble grows shallow trees for boosted ensembles of trees. Predict the label of the mean of X. predMeanX = predict(Mdl,mean(X)) predMeanX = 1x1 cell array {'g'}

Tip For an ensemble of classification trees, the Trained property of ens stores an ens.NumTrainedby-1 cell vector of compact classification models. For a textual or graphical display of tree t in the cell vector, enter: • view(ens.Trained{t}.CompactRegressionLearner) for ensembles aggregated using LogitBoost or GentleBoost. • view(ens.Trained{t}) for all other aggregation methods. 32-372

ClassificationEnsemble

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. • When you train an ensemble by using fitcensemble, code generation limitations for the weak learners used in the ensemble also apply to the ensemble. For more details, see the Code Generation sections of ClassificationKNN, CompactClassificationDiscriminant, and CompactClassificationTree. • For fixed-point code generation, you must train an ensemble using tree learners. For more information, see “Introduction to Code Generation” on page 31-2.

See Also ClassificationTree | CompactClassificationEnsemble | compareHoldout | fitcensemble | view Introduced in R2011a

32-373

32

Functions

ClassificationKNN k-nearest neighbor classification

Description ClassificationKNN is a nearest-neighbor classification model in which you can alter both the distance metric and the number of nearest neighbors. Because a ClassificationKNN classifier stores training data, you can use the model to compute resubstitution predictions. Alternatively, use the model to classify new observations using the predict method.

Creation Create a ClassificationKNN model using fitcknn.

Properties KNN Properties BreakTies — Tie-breaking algorithm 'smallest' (default) | 'nearest' | 'random' Tie-breaking algorithm used by predict when multiple classes have the same smallest cost, specified as one of the following: • 'smallest' — Use the smallest index among tied groups. • 'nearest' — Use the class with the nearest neighbor among tied groups. • 'random' — Use a random tiebreaker among tied groups. By default, ties occur when multiple classes have the same number of nearest points among the k nearest neighbors. BreakTies applies when IncludeTies is false. Change BreakTies using dot notation: mdl.BreakTies = newBreakTies. Distance — Distance metric 'cityblock' | 'chebychev' | 'correlation' | 'cosine' | 'euclidean' | function handle | ... Distance metric, specified as a character vector or a function handle. The values allowed depend on the NSMethod property. NSMethod

Distance Metric Allowed

'exhaustive'

Any distance metric of ExhaustiveSearcher

'kdtree'

'cityblock', 'chebychev', 'euclidean', or 'minkowski'

The following table lists the ExhaustiveSearcher distance metrics. 32-374

ClassificationKNN

Value

Description

'cityblock'

City block distance.

'chebychev'

Chebychev distance (maximum coordinate difference).

'correlation'

One minus the sample linear correlation between observations (treated as sequences of values).

'cosine'

One minus the cosine of the included angle between observations (treated as vectors).

'euclidean'

Euclidean distance.

'hamming'

Hamming distance, the percentage of coordinates that differ.

'jaccard'

One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.

'mahalanobis'

Mahalanobis distance, computed using a positive definite covariance matrix C. The default value of C is the sample covariance matrix of X, as computed by nancov(X). To specify a different value for C, set the DistParameter property of mdl using dot notation.

'minkowski'

Minkowski distance. The default exponent is 2. To specify a different exponent, set the DistParameter property of mdl using dot notation.

'seuclidean'

Standardized Euclidean distance. Each coordinate difference between X and a query point is scaled, meaning divided by a scale value S. The default value of S is the standard deviation computed from X, S = nanstd(X). To specify another value for S, set the DistParameter property of mdl using dot notation.

'spearman'

One minus the sample Spearman's rank correlation between observations (treated as sequences of values).

@distfun

Distance function handle. distfun has the form function D2 = distfun(ZI,ZJ) % calculation of distance ...

where • ZI is a 1-by-N vector containing one row of X or Y. • ZJ is an M2-by-N matrix containing multiple rows of X or Y. • D2 is an M2-by-1 vector of distances, and D2(k) is the distance between observations ZI and ZJ(k,:). For more information, see “Distance Metrics” on page 18-11. Change Distance using dot notation: mdl.Distance = newDistance. If NSMethod is 'kdtree', you can use dot notation to change Distance only for the metrics 'cityblock', 'chebychev', 'euclidean', and 'minkowski'. Data Types: char | function_handle DistanceWeight — Distance weighting function 'equal' | 'inverse' | 'squaredinverse' | function handle 32-375

32

Functions

Distance weighting function, specified as one of the values in this table. Value

Description

'equal'

No weighting

'inverse'

Weight is 1/distance

'squaredinverse'

Weight is 1/distance2

@fcn

fcn is a function that accepts a matrix of nonnegative distances and returns a matrix of the same size containing nonnegative distance weights. For example, 'squaredinverse' is equivalent to @(d)d.^(–2).

Change DistanceWeight using dot notation: mdl.DistanceWeight = newDistanceWeight. Data Types: char | function_handle DistParameter — Parameter for distance metric positive definite covariance matrix | positive scalar | vector of positive scale values Parameter for the distance metric, specified as one of the values described in this table. Distance Metric

Parameter

'mahalanobis'

Positive definite covariance matrix C

'minkowski'

Minkowski distance exponent, a positive scalar

'seuclidean'

Vector of positive scale values with length equal to the number of columns of X

For any other distance metric, the value of DistParameter must be []. You can alter DistParameter using dot notation: mdl.DistParameter = newDistParameter. However, if Distance is 'mahalanobis' or 'seuclidean', then you cannot alter DistParameter. Data Types: single | double IncludeTies — Tie inclusion flag false (default) | true Tie inclusion flag indicating whether predict includes all the neighbors whose distance values are equal to the kth smallest distance, specified as false or true. If IncludeTies is true, predict includes all of these neighbors. Otherwise, predict uses exactly k neighbors (see the BreakTies property). Change IncludeTies using dot notation: mdl.IncludeTies = newIncludeTies. Data Types: logical NSMethod — Nearest neighbor search method 'kdtree' | 'exhaustive' This property is read-only. Nearest neighbor search method, specified as either 'kdtree' or 'exhaustive'. 32-376

ClassificationKNN

• 'kdtree' — Creates and uses a Kd-tree to find nearest neighbors. • 'exhaustive' — Uses the exhaustive search algorithm. When predicting the class of a new point xnew, the software computes the distance values from all points in X to xnew to find nearest neighbors. The default value is 'kdtree' when X has 10 or fewer columns, X is not sparse, and the distance metric is a 'kdtree' type. Otherwise, the default value is 'exhaustive'. NumNeighbors — Number of nearest neighbors positive integer value Number of nearest neighbors in X used to classify each point during prediction, specified as a positive integer value. Change NumNeighbors using dot notation: mdl.NumNeighbors = newNumNeighbors. Data Types: single | double Other Classification Properties CategoricalPredictors — Categorical predictor indices [] | vector of positive integers This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). Data Types: double ClassNames — Names of classes in training data Y categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Names of the classes in the training data Y with duplicates removed, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as Y. (The software treats string arrays as cell arrays of character vectors.) Data Types: categorical | char | logical | single | double | cell Cost — Cost of misclassification square matrix Cost of the misclassification of a point, specified as a square matrix. Cost(i,j) is the cost of classifying a point into class j if its true class is i (that is, the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns in Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. By default, Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j. In other words, the cost is 0 for correct classification and 1 for incorrect classification. Change a Cost matrix using dot notation: mdl.Cost = costMatrix. Data Types: single | double 32-377

32

Functions

ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, specified as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell ModelParameters — Parameters used in training ClassificationKNN structure This property is read-only. Parameters used in training the ClassificationKNN model, specified as a structure. Data Types: struct Mu — Predictor means numeric vector This property is read-only. Predictor means, specified as a numeric vector of length numel(PredictorNames). If you do not standardize mdl when training the model using fitcknn, then Mu is empty ([]). Data Types: single | double NumObservations — Number of observations positive integer scalar This property is read-only. Number of observations used in training the ClassificationKNN model, specified as a positive integer scalar. This number can be less than the number of rows in the training data because rows containing NaN values are not part of the fit. Data Types: double PredictorNames — Predictor variable names cell array of character vectors This property is read-only. Predictor variable names, specified as a cell array of character vectors. The variable names are in the same order in which they appear in the training data X. Data Types: cell Prior — Prior probabilities for each class numeric vector 32-378

ClassificationKNN

Prior probabilities for each class, specified as a numeric vector. The order of the elements in Prior corresponds to the order of the classes in ClassNames. Add or change a Prior vector using dot notation: mdl.Prior = priorVector. Data Types: single | double ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char RowsUsed — Rows used in fitting [] | logical vector This property is read-only. Rows of the original data X used in fitting the ClassificationKNN model, specified as a logical vector. This property is empty if all rows are used. Data Types: logical ScoreTransform — Score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ... Score transformation, specified as either a character vector or a function handle. This table summarizes the available character vectors. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Change ScoreTransform using dot notation: mdl.ScoreTransform = newScoreTransform. 32-379

32

Functions

Data Types: char | function_handle Sigma — Predictor standard deviations numeric vector This property is read-only. Predictor standard deviations, specified as a numeric vector of length numel(PredictorNames). If you do not standardize the predictor variables during training, then Sigma is empty ([]). Data Types: single | double W — Observation weights vector of nonnegative values This property is read-only. Observation weights, specified as a vector of nonnegative values with the same number of rows as Y. Each entry in W specifies the relative importance of the corresponding observation in Y. Data Types: single | double X — Unstandardized predictor data numeric matrix This property is read-only. Unstandardized predictor data, specified as a numeric matrix. Each column of X represents one predictor (variable), and each row represents one observation. Data Types: single | double Y — Class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Class labels, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each value in Y is the observed class label for the corresponding row in X. Y has the same data type as the data in Y used for training the model. (The software treats string arrays as cell arrays of character vectors.) Data Types: single | double | logical | char | cell | categorical Hyperparameter Optimization Properties HyperparameterOptimizationResults — Cross-validation optimization of hyperparameters BayesianOptimization object | table This property is read-only. Cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty when the 'OptimizeHyperparameters' name-value pair argument is nonempty when you create the model using fitcknn. The value depends on the setting of the 'HyperparameterOptimizationOptions' name-value pair argument when you create the model: 32-380

ClassificationKNN

• 'bayesopt' (default) — Object of class BayesianOptimization • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Object Functions compareHoldout crossval edge loss margin predict resubEdge resubLoss resubMargin resubPredict

Compare accuracies of two classification models using new data Cross-validated k-nearest neighbor classifier Edge of k-nearest neighbor classifier Loss of k-nearest neighbor classifier Margin of k-nearest neighbor classifier Predict labels using k-nearest neighbor classification model Edge of k-nearest neighbor classifier by resubstitution Loss of k-nearest neighbor classifier by resubstitution Margin of k-nearest neighbor classifier by resubstitution Predict resubstitution labels of k-nearest neighbor classifier

Examples Train k-Nearest Neighbor Classifier Train a k-nearest neighbor classifier for Fisher's iris data, where k, the number of nearest neighbors in the predictors, is 5. Load Fisher's iris data. load fisheriris X = meas; Y = species;

X is a numeric matrix that contains four petal measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species. Train a 5-nearest neighbor classifier. Standardize the noncategorical predictor data. Mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1) Mdl = ClassificationKNN ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: Distance: NumNeighbors:

'Y' [] {'setosa' 'versicolor' 'none' 150 'euclidean' 5

'virginica'}

Properties, Methods

Mdl is a trained ClassificationKNN classifier, and some of its properties appear in the Command Window. To access the properties of Mdl, use dot notation. 32-381

32

Functions

Mdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } Mdl.Prior ans = 1×3 0.3333

0.3333

0.3333

Mdl.Prior contains the class prior probabilities, which you can specify using the 'Prior' namevalue pair argument in fitcknn. The order of the class prior probabilities corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data. You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively. Mdl.Prior = [0.5 0.2 0.3];

You can pass Mdl to predict to label new measurements or crossval to cross-validate the classifier.

Tips • The compact function reduces the size of most classification models by removing the training data properties and any other properties that are not required to predict the labels of new observations. Because k-nearest neighbor classification models require all of the training data to predict labels, you cannot reduce the size of a ClassificationKNN model.

Alternative Functionality knnsearch finds the k-nearest neighbors of points. rangesearch finds all the points within a fixed distance. You can use these functions for classification, as shown in “Classify Query Data” on page 1817. If you want to perform classification, then using ClassificationKNN models can be more convenient because you can train a classifier in one step (using fitcknn) and classify in other steps (using predict). Alternatively, you can train a k-nearest neighbor classification model using one of the cross-validation options in the call to fitcknn. In this case, fitcknn returns a ClassificationPartitionedModel cross-validated model object.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. 32-382

ClassificationKNN

• When you train a k-nearest neighbor classification model by using fitcknn, the following restrictions apply. • The class labels input argument value (Y) cannot be a categorical array. • Code generation does not support categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the 'CategoricalPredictors' name-value pair argument. To include categorical predictors in a model, preprocess the categorical predictors by using dummyvar before fitting the model. • The value of the 'ClassNames' name-value pair argument cannot be a categorical array. • The value of the 'Distance' name-value pair argument cannot be a custom distance function. • The value of the 'DistanceWeight' name-value pair argument can be a custom distance weight function, but it cannot be an anonymous function. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For more information, see “Introduction to Code Generation” on page 31-2.

See Also fitcknn | predict Topics “Construct KNN Classifier” on page 18-27 “Examine Quality of KNN Classifier” on page 18-27 “Predict Classification Using KNN Classifier” on page 18-28 “Modify KNN Classifier” on page 18-28 “Classification Using Nearest Neighbors” on page 18-11 Introduced in R2012a

32-383

32

Functions

ClassificationLinear class Linear model for binary classification of high-dimensional data

Description ClassificationLinear is a trained linear model object for binary classification; the linear model is a support vector machine (SVM) or logistic regression model. fitclinear fits a ClassificationLinear model by minimizing the objective function using techniques that reduce computation time for high-dimensional data sets (e.g., stochastic gradient descent). The classification loss plus the regularization term compose the objective function. Unlike other classification models, and for economical memory usage, ClassificationLinear model objects do not store the training data. However, they do store, for example, the estimated linear model coefficients, prior-class probabilities, and the regularization strength. You can use trained ClassificationLinear models to predict labels or classification scores for new data. For details, see predict.

Construction Create a ClassificationLinear object by using fitclinear.

Properties Linear Classification Properties

Lambda — Regularization term strength nonnegative scalar | vector of nonnegative values Regularization term strength, specified as a nonnegative scalar or vector of nonnegative values. Data Types: double | single Learner — Linear classification model type 'logistic' | 'svm' Linear classification model type, specified as 'logistic' or 'svm'. In this table, f x = xβ + b . • β is a vector of p coefficients. • x is an observation from p predictor variables. • b is the scalar bias.

32-384

Value

Algorithm

Loss Function

FittedLoss Value

'logistic'

Logistic regression

Deviance (logistic): ℓ y, f x = log 1 + exp −yf x

'logit'

ClassificationLinear class

Value

Algorithm

Loss Function

'svm'

Support vector machine Hinge: ℓ y, f x = max 0, 1 − yf x

FittedLoss Value 'hinge'

Beta — Linear coefficient estimates numeric vector Linear coefficient estimates, specified as a numeric vector with length equal to the number of predictors. Data Types: double Bias — Estimated bias term numeric scalar Estimated bias term or model intercept, specified as a numeric scalar. Data Types: double FittedLoss — Loss function used to fit linear model 'hinge' | 'logit' Loss function used to fit the linear model, specified as 'hinge' or 'logit'. Value

Algorithm

Loss Function

'hinge'

Support vector machine Hinge: ℓ y, f x = max 0, 1 − yf x

'svm'

'logit'

Logistic regression

'logistic'

Deviance (logistic): ℓ y, f x = log 1 + exp −yf x

Learner Value

Regularization — Complexity penalty type 'lasso (L1)' | 'ridge (L2)' Complexity penalty type, specified as 'lasso (L1)' or 'ridge (L2)'. The software composes the objective function for minimization from the sum of the average loss function (see FittedLoss) and a regularization value from this table. Value 'lasso (L1)' 'ridge (L2)'

Description Lasso (L1) penalty: λ Ridge (L2) penalty:

p



j=1

βj

p

λ 2 βj 2 j∑ =1

λ specifies the regularization term strength (see Lambda). The software excludes the bias term (β0) from the regularization penalty. Other Classification Properties

CategoricalPredictors — Indices of categorical predictors [] 32-385

32

Functions

Indices of categorical predictors, whose value is always empty ([]) because a ClassificationLinear model does not support categorical predictors. ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double ModelParameters — Parameters used for training model structure Parameters used for training the ClassificationLinear model, specified as a structure. Access fields of ModelParameters using dot notation. For example, access the relative tolerance on the linear coefficients and the bias term by using Mdl.ModelParameters.BetaTolerance. Data Types: struct PredictorNames — Predictor names cell array of character vectors Predictor names in order of their appearance in the predictor data X, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of columns in X. Data Types: cell ExpandedPredictorNames — Expanded predictor names cell array of character vectors Expanded predictor names, specified as a cell array of character vectors. Because a ClassificationLinear model does not support categorical predictors, ExpandedPredictorNames and PredictorNames are equal. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. 32-386

ClassificationLinear class

Prior class probabilities, specified as a numeric vector. Prior has as many elements as classes in ClassNames, and the order of the elements corresponds to the elements of ClassNames. Data Types: double ScoreTransform — Score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Score transformation function to apply to predicted scores, specified as a function name or function handle. For linear classification models and before transformation, the predicted classification score for the observation x (row vector) is f(x) = xβ + b, where β and b correspond to Mdl.Beta and Mdl.Bias, respectively. To change the score transformation function to, for example, function, use dot notation. • For a built-in function, enter this code and replace function with a value in the table. Mdl.ScoreTransform = 'function';

Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function, or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class. Data Types: char | function_handle ResponseName — Response variable name character vector Response variable name, specified as a character vector. Data Types: char 32-387

32

Functions

Methods edge

Classification edge for linear classification models

loss

Classification loss for linear classification models

margin

Classification margins for linear classification models

predict

Predict labels for linear classification models

selectModels

Choose subset of regularized, binary linear classification models

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples Train Linear Classification Model Train a binary, linear classification model using support vector machines, dual SGD, and ridge regularization. Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Train a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. Train the model using the entire data set. Determine how well the optimization algorithm fit the model to the data by extracting a fit summary. rng(1); % For reproducibility [Mdl,FitInfo] = fitclinear(X,Ystats) Mdl = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'none' Beta: [34023x1 double] Bias: -1.0059 Lambda: 3.1674e-05 Learner: 'svm' Properties, Methods

32-388

ClassificationLinear class

FitInfo = struct with fields: Lambda: 3.1674e-05 Objective: 5.3783e-04 PassLimit: 10 NumPasses: 10 BatchLimit: [] NumIterations: 238561 GradientNorm: NaN GradientTolerance: 0 RelativeChangeInBeta: 0.0562 BetaTolerance: 1.0000e-04 DeltaGradient: 1.4582 DeltaGradientTolerance: 1 TerminationCode: 0 TerminationStatus: {'Iteration limit exceeded.'} Alpha: [31572x1 double] History: [] FitTime: 0.2391 Solver: {'dual'}

Mdl is a ClassificationLinear model. You can pass Mdl and the training or new data to loss to inspect the in-sample classification error. Or, you can pass Mdl and new predictor data to predict to predict class labels for new observations. FitInfo is a structure array containing, among other things, the termination status (TerminationStatus) and how long the solver took to fit the model to the data (FitTime). It is good practice to use FitInfo to determine whether optimization-termination measurements are satisfactory. Because training time is small, you can try to retrain the model, but increase the number of passes through the data. This can improve measures like DeltaGradient.

Predict Class Labels Using Linear Classification Model Load the NLP data set. load nlpdata n = size(X,1); % Number of observations

Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Hold out 5% of the data. rng(1); % For reproducibility cvp = cvpartition(n,'Holdout',0.05) cvp = Hold-out cross validation partition NumObservations: 31572 NumTestSets: 1 TrainSize: 29994 TestSize: 1578

32-389

32

Functions

cvp is a CVPartition object that defines the random partition of n data into training and test sets. Train a binary, linear classification model using the training set that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. For faster training time, orient the predictor data matrix so that the observations are in columns. idxTrain = training(cvp); % Extract training set indices X = X'; Mdl = fitclinear(X(:,idxTrain),Ystats(idxTrain),'ObservationsIn','columns');

Predict observations and classification error for the hold out sample. idxTest = test(cvp); % Extract test set indices labels = predict(Mdl,X(:,idxTest),'ObservationsIn','columns'); L = loss(Mdl,X(:,idxTest),Ystats(idxTest),'ObservationsIn','columns') L = 7.1753e-04

Mdl misclassifies fewer than 1% of the out-of-sample observations.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict and update functions support code generation. • When you train a linear classification model by using fitclinear, the following restrictions apply. • The predictor data input argument value (X) must be a full, numeric matrix. • The class labels input argument value (Y) cannot be a categorical array. • The value of the 'ClassNames' name-value pair argument or property cannot be a categorical array. • You can specify only one regularization strength, either 'auto' or a nonnegative scalar for the 'Lambda' name-value pair argument. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For more information, see “Introduction to Code Generation” on page 31-2.

See Also ClassificationECOC | ClassificationKernel | ClassificationPartitionedLinear | ClassificationPartitionedLinearECOC | fitclinear | predict Introduced in R2016a

32-390

ClassificationLinearCoderConfigurer

ClassificationLinearCoderConfigurer Coder configurer for linear binary classification of high-dimensional data

Description A ClassificationLinearCoderConfigurer object is a coder configurer of a linear classification model (ClassificationLinear) used for binary classification of high-dimensional data. A coder configurer offers convenient features to configure code generation options, generate C/C++ code, and update model parameters in the generated code. • Configure code generation options and specify the coder attributes of linear model parameters by using object properties. • Generate C/C++ code for the predict and update functions of the linear classification model by using generateCode. Generating C/C++ code requires MATLAB Coder. • Update model parameters in the generated C/C++ code without having to regenerate the code. This feature reduces the effort required to regenerate, redeploy, and reverify C/C++ code when you retrain the linear model with new data or settings. Before updating model parameters, use validatedUpdateInputs to validate and extract the model parameters to update. This flow chart shows the code generation workflow using a coder configurer.

For the code generation usage notes and limitations of a linear classification model, see the Code Generation sections of ClassificationLinear, predict, and update.

Creation After training a linear classification model by using fitclinear, create a coder configurer for the model by using learnerCoderConfigurer. Use the properties of a coder configurer to specify the coder attributes of the predict and update arguments. Then, use generateCode to generate C/C+ + code based on the specified coder attributes.

Properties predict Arguments The properties listed in this section specify the coder attributes of the predict function arguments in the generated code. 32-391

32

Functions

X — Coder attributes of predictor data LearnerCoderInput object Coder attributes of the predictor data to pass to the generated C/C++ code for the predict function of the linear classification model, specified as a LearnerCoderInput on page 32-402 object. When you create a coder configurer by using the learnerCoderConfigurer function, the input argument X determines the default values of the LearnerCoderInput coder attributes: • SizeVector — The default value is the array size of the input X. • If the Value attribute of the ObservationsIn property for the ClassificationLinearCoderConfigurer is 'rows', then this SizeVector value is [n p], where n corresponds to the number of observations and p corresponds to the number of predictors. • If the Value attribute of the ObservationsIn property for the ClassificationLinearCoderConfigurer is 'columns', then this SizeVector value is [p n]. To switch the elements of SizeVector (for example, to change [n p] to [p n]), modify the Value attribute of the ObservationsIn property for the ClassificationLinearCoderConfigurer accordingly. You cannot modify the SizeVector value directly. • VariableDimensions — The default value is [0 0], which indicates that the array size is fixed as specified in SizeVector. You can set this value to [1 0] if the SizeVector value is [n p] or to [0 1] if it is [p n], which indicates that the array has variable-size rows and fixed-size columns. For example, [1 0] specifies that the first value of SizeVector (n) is the upper bound for the number of rows, and the second value of SizeVector (p) is the number of columns. • DataType — This value is single or double. The default data type depends on the data type of the input X. • Tunability — This value must be true, meaning that predict in the generated C/C++ code always includes predictor data as an input. You can modify the coder attributes by using dot notation. For example, to generate C/C++ code that accepts predictor data with 100 observations (in rows) of three predictor variables (in columns), specify these coder attributes of X for the coder configurer configurer: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [0 0];

[0 0] indicates that the first and second dimensions of X (number of observations and number of predictor variables, respectively) have fixed sizes. To allow the generated C/C++ code to accept predictor data with up to 100 observations, specify these coder attributes of X: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [1 0];

[1 0] indicates that the first dimension of X (number of observations) has a variable size and the second dimension of X (number of predictor variables) has a fixed size. The specified number of 32-392

ClassificationLinearCoderConfigurer

observations, 100 in this example, becomes the maximum allowed number of observations in the generated C/C++ code. To allow any number of observations, specify the bound as Inf. ObservationsIn — Coder attributes of predictor data observation dimension EnumeratedInput object Coder attributes of the predictor data observation dimension ('ObservationsIn' name-value pair argument of predict), specified as an EnumeratedInput on page 32-403 object. When you create a coder configurer by using the learnerCoderConfigurer function, the 'ObservationsIn' name-value pair argument determines the default values of the EnumeratedInput coder attributes: • Value — The default value is the predictor data observation dimension you use when creating the coder configurer, specified as 'rows' or 'columns'. If you do not specify 'ObservationsIn' when creating the coder configurer, the default value is 'rows'. • SelectedOption — This value is always 'Built-in'. This attribute is read-only. • BuiltInOptions — Cell array of 'rows' and 'columns'. This attribute is read-only. • IsConstant — This value must be true. • Tunability — The default value is false if you specify 'ObservationsIn','rows' when creating the coder configurer, and true if you specify 'ObservationsIn','columns'. If you set Tunability to false, the software sets Value to 'rows'. If you specify other attribute values when Tunability is false, the software sets Tunability to true. NumOutputs — Number of outputs in predict 1 (default) | 2 Number of output arguments to return from the generated C/C++ code for the predict function of the linear classification model, specified as 1 or 2. The output arguments of predict are Label (predicted class labels) and Score (classification scores), in that order. predict in the generated C/C++ code returns the first n outputs of the predict function, where n is the NumOutputs value. After creating the coder configurer configurer, you can specify the number of outputs by using dot notation. configurer.NumOutputs = 2;

The NumOutputs property is equivalent to the '-nargout' compiler option of codegen. This option specifies the number of output arguments in the entry-point function of code generation. The object function generateCode generates two entry-point functions—predict.m and update.m for the predict and update functions of a linear classification model, respectively—and generates C/C++ code for the two entry-point functions. The specified value for the NumOutputs property corresponds to the number of output arguments in the entry-point function predict.m. Data Types: double update Arguments The properties listed in this section specify the coder attributes of the update function arguments in the generated code. The update function takes a trained model and new model parameters as input arguments, and returns an updated version of the model that contains the new parameters. To enable updating the parameters in the generated code, you need to specify the coder attributes of the 32-393

32

Functions

parameters before generating code. Use a LearnerCoderInput on page 32-402 object to specify the coder attributes of each parameter. The default attribute values are based on the model parameters in the input argument Mdl of learnerCoderConfigurer. Beta — Coder attributes of linear predictor coefficients LearnerCoderInput object Coder attributes of the linear predictor coefficients (Beta of a linear classification model), specified as a LearnerCoderInput on page 32-402 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [p 1], where p is the number of predictors in Mdl. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — This value must be true. Bias — Coder attributes of bias term LearnerCoderInput object Coder attributes of the bias term (Bias of a linear classification model), specified as a LearnerCoderInput on page 32-402 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — This value must be true. Cost — Coder attributes of misclassification cost LearnerCoderInput object Coder attributes of the misclassification cost (Cost of a linear classification model), specified as a LearnerCoderInput on page 32-402 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [2 2]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — The default value is true. 32-394

ClassificationLinearCoderConfigurer

Prior — Coder attributes of prior probabilities LearnerCoderInput object Coder attributes of the prior probabilities (Prior of a linear classification model), specified as a LearnerCoderInput on page 32-402 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [1 2]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — The default value is true. Other Configurer Options OutputFileName — File name of generated C/C++ code 'ClassificationLinearModel' (default) | character vector File name of the generated C/C++ code, specified as a character vector. The object function generateCode of ClassificationLinearCoderConfigurer generates C/C+ + code using this file name. The file name must not contain spaces because they can lead to code generation failures in certain operating system configurations. Also, the name must be a valid MATLAB function name. After creating the coder configurer configurer, you can specify the file name by using dot notation. configurer.OutputFileName = 'myModel';

Data Types: char Verbose — Verbosity level true (logical 1) (default) | false (logical 0) Verbosity level, specified as true (logical 1) or false (logical 0). The verbosity level controls the display of notification messages at the command line. Value

Description

true (logical 1)

The software displays notification messages when your changes to the coder attributes of a parameter result in changes for other dependent parameters.

false (logical 0)

The software does not display notification messages.

To enable updating machine learning model parameters in the generated code, you need to configure the coder attributes of the parameters before generating code. The coder attributes of parameters are dependent on each other, so the software stores the dependencies as configuration constraints. If you modify the coder attributes of a parameter by using a coder configurer, and the modification requires subsequent changes to other dependent parameters to satisfy configuration constraints, 32-395

32

Functions

then the software changes the coder attributes of the dependent parameters. The verbosity level determines whether or not the software displays notification messages for these subsequent changes. After creating the coder configurer configurer, you can modify the verbosity level by using dot notation. configurer.Verbose = false;

Data Types: logical Options for Code Generation Customization To customize the code generation workflow, use the generateFiles function and the following three properties with codegen, instead of using the generateCode function. After generating the two entry-point function files (predict.m and update.m) by using the generateFiles function, you can modify these files according to your code generation workflow. For example, you can modify the predict.m file to include data preprocessing, or you can add these entry-point functions to another code generation project. Then, you can generate C/C++ code by using the codegen function and the codegen arguments appropriate for the modified entry-point functions or code generation project. Use the three properties described in this section as a starting point to set the codegen arguments. CodeGenerationArguments — codegen arguments cell array This property is read-only. codegen arguments, specified as a cell array. This property enables you to customize the code generation workflow. Use the generateCode function if you do not need to customize your workflow. Instead of using generateCode with the coder configurer configurer, you can generate C/C++ code as follows: generateFiles(configurer) cgArgs = configurer.CodeGenerationArguments; codegen(cgArgs{:})

If you customize the code generation workflow, modify cgArgs accordingly before calling codegen. If you modify other properties of configurer, the software updates the CodeGenerationArguments property accordingly. Data Types: cell PredictInputs — List of tunable input arguments of predict cell array This property is read-only. List of tunable input arguments of the entry-point function predict.m for code generation, specified as a cell array. The cell array contains another cell array that includes coder.PrimitiveType objects and coder.Constant objects. 32-396

ClassificationLinearCoderConfigurer

If you modify the coder attributes of predict arguments on page 32-391, then the software updates the corresponding objects accordingly. If you specify the Tunability attribute as false, then the software removes the corresponding objects from the PredictInputs list. The cell array in PredictInputs is equivalent to configurer.CodeGenerationArguments{6} for the coder configurer configurer. Data Types: cell UpdateInputs — List of tunable input arguments of update cell array of a structure including coder.PrimitiveType objects This property is read-only. List of the tunable input arguments of the entry-point function update.m for code generation, specified as a cell array of a structure including coder.PrimitiveType objects. Each coder.PrimitiveType object includes the coder attributes of a tunable machine learning model parameter. If you modify the coder attributes of a model parameter by using the coder configurer properties (update Arguments on page 32-393 properties), then the software updates the corresponding coder.PrimitiveType object accordingly. If you specify the Tunability attribute of a machine learning model parameter as false, then the software removes the corresponding coder.PrimitiveType object from the UpdateInputs list. The structure in UpdateInputs is equivalent to configurer.CodeGenerationArguments{3} for the coder configurer configurer. Data Types: cell

Object Functions generateCode generateFiles validatedUpdateInputs

Generate C/C++ code using coder configurer Generate MATLAB files for code generation using coder configurer Validate and extract machine learning model parameters to update

Examples Generate Code Using Coder Configurer Train a machine learning model, and then generate code for the predict and update functions of the model by using a coder configurer. Load the ionosphere data set, and train a binary linear classification model. Pass the transposed predictor matrix Xnew to fitclinear, and use the 'ObservationsIn' name-value pair argument to specify that the columns of Xnew correspond to observations. load ionosphere Xnew = X'; Mdl = fitclinear(Xnew,Y,'ObservationsIn','columns');

Mdl is a ClassificationLinear object. Create a coder configurer for the ClassificationLinear model by using learnerCoderConfigurer. Specify the predictor data Xnew, and use the 'ObservationsIn' 32-397

32

Functions

name-value pair argument to specify the observation dimension of Xnew. The learnerCoderConfigurer function uses these input arguments to configure the coder attributes of the corresponding input arguments of predict. configurer = learnerCoderConfigurer(Mdl,Xnew,'ObservationsIn','columns') configurer = ClassificationLinearCoderConfigurer with properties: Update Inputs: Beta: Bias: Prior: Cost:

[1x1 [1x1 [1x1 [1x1

LearnerCoderInput] LearnerCoderInput] LearnerCoderInput] LearnerCoderInput]

Predict Inputs: X: [1x1 LearnerCoderInput] ObservationsIn: [1x1 EnumeratedInput] Code Generation Parameters: NumOutputs: 1 OutputFileName: 'ClassificationLinearModel' Properties, Methods

configurer is a ClassificationLinearCoderConfigurer object, which is a coder configurer of a ClassificationLinear object. To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Generate code for the predict and update functions of the linear classification model (Mdl). generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationLinearModel.mat'

The generateCode function completes these actions: • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named ClassificationLinearModel for the two entry-point functions. • Create the code for the MEX function in the codegen\mex\ClassificationLinearModel folder. • Copy the MEX function to the current folder. Display the contents of the predict.m, update.m, and initialize.m files by using the type function. type predict.m function varargout = predict(X,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:23:22

32-398

ClassificationLinearCoderConfigurer

[varargout{1:nargout}] = initialize('predict',X,varargin{:}); end type update.m function update(varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:23:22 initialize('update',varargin{:}); end type initialize.m function [varargout] = initialize(command,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:23:22 coder.inline('always') persistent model if isempty(model) model = loadLearnerForCoder('ClassificationLinearModel.mat'); end switch(command) case 'update' % Update struct fields: Beta % Bias % Prior % Cost model = update(model,varargin{:}); case 'predict' % Predict Inputs: X, ObservationsIn X = varargin{1}; if nargin == 2 [varargout{1:nargout}] = predict(model,X); else PVPairs = cell(1,nargin-2); for i = 1:nargin-2 PVPairs{1,i} = varargin{i+1}; end [varargout{1:nargout}] = predict(model,X,PVPairs{:}); end end end

Update Parameters of Linear Classification Model in Generated Code Train a linear classification model using a partial data set and create a coder configurer for the model. Use the properties of the coder configurer to specify coder attributes of the linear model parameters. Use the object function of the coder configurer to generate C code that predicts labels for new predictor data. Then retrain the model using the entire data set, and update parameters in the generated code without regenerating the code. Train Model Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). Train a binary linear classification model using half of the observations. Transpose the predictor data, and use the 'ObservationsIn' name-value pair argument to specify that the columns of XTrain correspond to observations. 32-399

32

Functions

load ionosphere rng('default') % For reproducibility n = length(Y); c = cvpartition(Y,'HoldOut',0.5); idxTrain = training(c,1); XTrain = X(idxTrain,:)'; YTrain = Y(idxTrain); Mdl = fitclinear(XTrain,YTrain,'ObservationsIn','columns');

Mdl is a ClassificationLinear object. Create Coder Configurer Create a coder configurer for the ClassificationLinear model by using learnerCoderConfigurer. Specify the predictor data XTrain, and use the 'ObservationsIn' name-value pair argument to specify the observation dimension of XTrain. The learnerCoderConfigurer function uses these input arguments to configure the coder attributes of the corresponding input arguments of predict. Also, set the number of outputs to 2 so that the generated code returns predicted labels and scores. configurer = learnerCoderConfigurer(Mdl,XTrain,'ObservationsIn','columns','NumOutputs',2);

configurer is a ClassificationLinearCoderConfigurer object, which is a coder configurer of a ClassificationLinear object. Specify Coder Attributes of Parameters Specify the coder attributes of the linear classification model parameters so that you can update the parameters in the generated code after retraining the model. This example specifies the coder attributes of the predictor data that you want to pass to the generated code. Specify the coder attributes of the X property of configurer so that the generated code accepts any number of observations. Modify the SizeVector and VariableDimensions attributes. The SizeVector attribute specifies the upper bound of the predictor data size, and the VariableDimensions attribute specifies whether each dimension of the predictor data has a variable size or fixed size. configurer.X.SizeVector = [34 Inf]; configurer.X.VariableDimensions ans = 1x2 logical array 0

1

The size of the first dimension is the number of predictor variables. This value must be fixed for a machine learning model. Because the predictor data contains 34 predictors, the value of the SizeVector attribute must be 34 and the value of the VariableDimensions attribute must be 0. The size of the second dimension is the number of observations. Setting the value of the SizeVector attribute to Inf causes the software to change the value of the VariableDimensions attribute to 1. In other words, the upper bound of the size is Inf and the size is variable, meaning that the predictor data can have any number of observations. This specification is convenient if you do not know the number of observations when generating code. 32-400

ClassificationLinearCoderConfigurer

The order of the dimensions in SizeVector and VariableDimensions depends on the coder attributes of ObservationsIn. configurer.ObservationsIn ans = EnumeratedInput with properties: Value: SelectedOption: BuiltInOptions: IsConstant: Tunability:

'columns' 'Built-in' {'rows' 'columns'} 1 1

When the Value attribute of the ObservationsIn property is 'columns', the first dimension of the SizeVector and VariableDimensions attributes of X corresponds to the number of predictors, and the second dimension corresponds to the number of observations. When the Value attribute of ObservationsIn is 'rows', the order of the dimensions is switched. Generate Code To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Generate code for the predict and update functions of the linear classification model (Mdl). generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationLinearModel.mat'

The generateCode function completes these actions: • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named ClassificationLinearModel for the two entry-point functions. • Create the code for the MEX function in the codegen\mex\ClassificationLinearModel folder. • Copy the MEX function to the current folder. Verify Generated Code Pass some predictor data to verify whether the predict function of Mdl and the predict function in the MEX function return the same labels. To call an entry-point function in a MEX function that has more than one entry point, specify the function name as the first input argument. [label,score] = predict(Mdl,XTrain,'ObservationsIn','columns'); [label_mex,score_mex] = ClassificationLinearModel('predict',XTrain,'ObservationsIn','columns');

Compare label and label_mex by using isequal. isequal(label,label_mex)

32-401

32

Functions

ans = logical 1

isequal returns logical 1 (true) if all the inputs are equal. The comparison confirms that the predict function of Mdl and the predict function in the MEX function return the same labels. Compare score and score_mex. max(abs(score-score_mex),[],'all') ans = 0

In general, score_mex might include round-off differences compared to score. In this case, the comparison confirms that score and score_mex are equal. Retrain Model and Update Parameters in Generated Code Retrain the model using the entire data set. retrainedMdl = fitclinear(X',Y,'ObservationsIn','columns');

Extract parameters to update by using validatedUpdateInputs. This function detects the modified model parameters in retrainedMdl and validates whether the modified parameter values satisfy the coder attributes of the parameters. params = validatedUpdateInputs(configurer,retrainedMdl);

Update parameters in the generated code. ClassificationLinearModel('update',params)

Verify Generated Code Compare the outputs from the predict function of retrainedMdl and the predict function in the updated MEX function. [label,score] = predict(retrainedMdl,X','ObservationsIn','columns'); [label_mex,score_mex] = ClassificationLinearModel('predict',X','ObservationsIn','columns'); isequal(label,label_mex) ans = logical 1 max(abs(score-score_mex),[],'all') ans = 0

The comparison confirms that label and label_mex are equal, and that the score values are equal.

More About LearnerCoderInput Object A coder configurer uses a LearnerCoderInput object to specify the coder attributes of predict and update input arguments. 32-402

ClassificationLinearCoderConfigurer

A LearnerCoderInput object has the following attributes to specify the properties of an input argument array in the generated code. Attribute Name

Description

SizeVector

Array size if the corresponding VariableDimensions value is false. Upper bound of the array size if the corresponding VariableDimensions value is true. To allow an unbounded array, specify the bound as Inf.

VariableDimensions

Indicator specifying whether each dimension of the array has a variable size or fixed size, specified as true (logical 1) or false (logical 0): • A value of true (logical 1) means that the corresponding dimension has a variable size. • A value of false (logical 0) means that the corresponding dimension has a fixed size.

DataType

Data type of the array

Tunability

Indicator specifying whether or not predict or update includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the data type of the bias term Bias of the coder configurer configurer: configurer.Bias.DataType = 'single';

If you specify the verbosity level (Verbose) as true (default), then the software displays notification messages when you modify the coder attributes of a machine learning model parameter and the modification changes the coder attributes of other dependent parameters. EnumeratedInput Object A coder configurer uses an EnumeratedInput object to specify the coder attributes of predict input arguments that have a finite set of available values. An EnumeratedInput object has the following attributes to specify the properties of an input argument array in the generated code.

32-403

32

Functions

Attribute Name

Description

Value

Value of the predict argument in the generated code, specified as a character vector or a LearnerCoderInput on page 32-402 object. • Character vector in BuiltInOptions — You can specify one of the BuiltInOptions using either the option name or its index value. For example, to choose the first option, specify Value as either the first character vector in BuiltInOptions or 1. • Character vector designating a custom function name — To use a custom option, define a custom function on the MATLAB search path, and specify Value as the name of the custom function. • LearnerCoderInput on page 32-402 object — If you set IsConstant to false (logical 0), then the software changes Value to a LearnerCoderInput on page 32-402 object with the following read-only coder attribute values. These values indicate that the input in the generated code is a variable-size, tunable character vector that is one of the available values in BuiltInOptions. • SizeVector — [1 c], indicating the upper bound of the array size, where c is the length of the longest available character vector in Option • VariableDimensions — [0 1], indicating that the array is a variable-size vector • DataType — 'char' • Tunability — 1 The default value of Value is consistent with the default value of the corresponding predict argument, which is one of the character vectors in BuiltInOptions.

SelectedOption

Status of the selected option, specified as 'Built-in', 'Custom', or 'NonConstant'. The software sets SelectedOption according to Value: • 'Built-in'(default) — When Value is one of the character vectors in BuiltInOptions • 'Custom' — When Value is a character vector that is not in BuiltInOptions • 'NonConstant' — When Value is a LearnerCoderInput on page 32-402 object This attribute is read-only.

BuiltInOptions

List of available character vectors for the corresponding predict argument, specified as a cell array. This attribute is read-only.

32-404

ClassificationLinearCoderConfigurer

Attribute Name

Description

IsConstant

Indicator specifying whether or not the array value is a compiletime constant (coder.Constant) in the generated code, specified as true (logical 1, default) or false (logical 0). If you set this value to false, then the software changes Value to a LearnerCoderInput on page 32-402 object.

Tunability

Indicator specifying whether or not predict includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0, default). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the coder attributes of ObservationsIn of the coder configurer configurer: configurer.ObservationsIn.Value = 'columns';

See Also ClassificationECOCCoderConfigurer | ClassificationLinear | learnerCoderConfigurer | predict | update Topics “Introduction to Code Generation” on page 31-2 “Code Generation for Prediction and Update Using Coder Configurer” on page 31-68 Introduced in R2019b

32-405

32

Functions

ClassificationNaiveBayes class Superclasses: CompactClassificationNaiveBayes Naive Bayes classification

Description ClassificationNaiveBayes is a naive Bayes on page 32-414 classifier for multiclass learning. Use fitcnb and the training data to train a ClassificationNaiveBayes classifier. Trained ClassificationNaiveBayes classifiers store the training data, parameter values, data distribution, and prior probabilities. You can use these classifiers to: • Estimate resubstitution predictions. For details, see resubPredict. • Predict labels or posterior probabilities for new data. For details, see predict.

Construction Create a ClassificationNaiveBayes object by using fitcnb.

Properties CategoricalPredictors — Indices of categorical predictors vector of positive integers Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double CategoricalLevels — Multivariate multinomial levels cell vector of numeric vectors Multivariate multinomial levels, specified as a cell vector of numeric vectors. CategoricalLevels has length equal to the number of predictors (size(X,2)). The cells of CategoricalLevels correspond to predictors that you specified as 'mvmn' (i.e., having a multivariate multinomial distribution) during training. Cells that do not correspond to a multivariate multinomial distribution are empty ([]). If predictor j is multivariate multinomial, then CategoricalLevels{j} is a list of all distinct values of predictor j in the sample (NaNs removed from unique(X(:,j))). Data Types: cell ClassNames — Distinct class names categorical array | character array | logical vector | numeric vector | cell array of character vectors Distinct class names, specified as a categorical or character array, logical or numeric vector, or cell vector of character vectors. 32-406

ClassificationNaiveBayes class

ClassNames is the same data type as Y, and has K elements or rows for character arrays. (The software treats string arrays as cell arrays of character vectors.) Data Types: categorical | char | logical | single | double | cell Cost — Misclassification cost square matrix Misclassification cost, specified as a K-by-K square matrix. The value of Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost correspond to the order of the classes in ClassNames. The value of Cost does not influence training. You can reset Cost after training Mdl using dot notation, e.g., Mdl.Cost = [0 0.5; 1 0];. Data Types: double | single DistributionNames — Predictor distributions 'normal' (default) | 'kernel' | 'mn' | 'mvmn' | cell array of character vectors Predictor distributions fitcnb uses to model the predictors, specified as a character vector or cell array of character vectors. This table summarizes the available distributions. Value

Description

'kernel'

Kernel smoothing density estimate.

'mn'

Multinomial bag-of-tokens model on page 32-414. Indicates that all predictors have this distribution.

'mvmn'

Multivariate multinomial distribution.

'normal'

Normal (Gaussian) distribution.

If Distribution is a 1-by-P cell array of character vectors, then the software models feature j using the distribution in element j of the cell array. Data Types: char | cell DistributionParameters — Distribution parameter estimates cell array Distribution parameter estimates, specified as a cell array. DistributionParameters is a K-by-D cell array, where cell (k,d) contains the distribution parameter estimates for instances of predictor d in class k. The order of the rows corresponds to the order of the classes in the property ClassNames, and the order of the predictors corresponds to the order of the columns of X. If class k has no observations for predictor j, then Distribution{k,j} is empty ([]). The elements of DistributionParameters depends on the distributions of the predictors. This table describes the values in DistributionParameters{k,j}.

32-407

32

Functions

Distribution of Predictor j

Value

kernel

A KernelDistribution model. Display properties using cell indexing and dot notation. For example, to display the estimated bandwidth of the kernel density for predictor 2 in the third class, use Mdl.DistributionParameters{3,2}.BandWi dth.

mn

A scalar representing the probability that token j appears in class k. For details, see “Algorithms” on page 32-415.

mvmn

A numeric vector containing the probabilities for each possible level of predictor j in class k. The software orders the probabilities by the sorted order of all unique levels of predictor j (stored in the property CategoricalLevels). For more details, see “Algorithms” on page 32-415.

normal

A 2-by-1 numeric vector. The first element is the sample mean and the second element is the sample standard deviation.

Data Types: cell ExpandedPredictorNames — Expanded predictor names cell array of character vectors Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell HyperparameterOptimizationResults — Description of cross-validation optimization of hyperparameters BayesianOptimization object | table Description of the cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty if the 'OptimizeHyperparameters' name-value pair argument is nonempty when you create the model. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer field in the HyperparameterOptimizationOptions structure when you create the model, as described in this table.

32-408

Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

ClassificationNaiveBayes class

Value of Optimizer Field

Value of HyperparameterOptimizationResults

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Kernel — Kernel smoother types 'normal' (default) | 'box' | 'epanechnikov' | 'triangle' | cell array of character vectors Kernel smoother types, specified as a character vector or cell array of character vectors. Kernel has length equal to the number of predictors (size(X,2)). Kernel{j} corresponds to predictor j, and contains a character vector describing the type of kernel smoother. This table describes the supported kernel smoother types. Let I{u} denote the indicator function. Value

Kernel

'box'

Box (uniform)

'epanechnik Epanechnikov ov' 'normal'

Gaussian

'triangle'

Triangular

Formula f (x) = 0.5I x ≤ 1 f (x) = 0.75 1 − x2 I x ≤ 1 f (x) =

1 exp −0.5x2 2π

f (x) = 1 − x I x ≤ 1

If a cell is empty ([]), then the software did not fit a kernel distribution to the corresponding predictor. Data Types: char | cell ModelParameters — Parameter values used to train object Parameter values used to train the classifier (such as the name-value pair argument values), specified as an object. This table summarizes the properties of ModelParameters. The properties correspond to the name-value pair argument values set for training the classifier. Property

Purpose

DistributionNames

Data distribution or distributions. This is the same value as the property DistributionNames.

Kernel

Kernel smoother type. This is the same as the property Kernel.

Method

Training method. For naive Bayes, the value is 'NaiveBayes'.

Support

Kernel-smoothing density support. This is the same as the property Support.

Type

Learning type. For classification, the value is 'classification'.

32-409

32

Functions

Property

Purpose

Width

Kernel smoothing window width. This is the same as the property Width.

Access fields of ModelParameters using dot notation. For example, access the kernel support using Mdl.ModelParameters.Support. NumObservations — Number of training observations numeric scalar Number of training observations, specified as a numeric scalar. If X or Y contain missing values, then NumObservations might be less than the length of Y. Data Types: double PredictorNames — Predictor names cell array of character vectors Predictor names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in X. Data Types: cell Prior — Class prior probabilities numeric vector Class prior probabilities, specified as a numeric row vector. Prior is a 1-by-K vector, and the order of its elements correspond to the elements of ClassNames. fitcnb normalizes the prior probabilities you set using the name-value pair parameter 'Prior' so that sum(Prior) = 1. The value of Prior does not change the best-fitting model. Therefore, you can reset Prior after training Mdl using dot notation, e.g., Mdl.Prior = [0.2 0.8];. Data Types: double | single ResponseName — Response name character vector Response name, specified as a character vector. Data Types: char ScoreTransform — Classification score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Classification score transformation function, specified as a character vector or function handle. To change the score transformation function to e.g., function, use dot notation. • For a built-in function, enter this code and replace function with a value in the table. Mdl.ScoreTransform = 'function';

32-410

ClassificationNaiveBayes class

Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function, or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Data Types: char | function_handle Support — Kernel smoother density support cell vector Kernel smoother density support, specified as a cell vector. Support has length equal to the number of predictors (size(X,2)). The cells represent the regions to apply the kernel density. This table describes the supported options. Value

Description

1-by-2 numeric row vector

For example, [L,U], where L and U are the finite lower and upper bounds, respectively, for the density support.

'positive'

The density support is all positive real values.

'unbounded'

The density support is all real values.

If a cell is empty ([]), then the software did not fit a kernel distribution to the corresponding predictor. W — Observation weights numeric vector Observation weights, specified as a numeric vector. The length of W is NumObservations. fitcnb normalizes the value you set for the name-value pair parameter 'Weights' so that the weights within a particular class sum to the prior probability for that class. 32-411

32

Functions

Data Types: double Width — Kernel smoother window width numeric matrix Kernel smoother window width, specified as a numeric matrix. Width is a K-by-P matrix, where K is the number of classes in the data, and P is the number of predictors (size(X,2)). Width(k,j) is the kernel smoother window width for the kernel smoothing density of predictor j within class k. NaNs in column j indicate that the software did not fit predictor j using a kernel density. X — Unstandardized predictor data numeric matrix Unstandardized predictor data, specified as a numeric matrix. X has NumObservations rows and P columns. Each row of X corresponds to one observation, and each column corresponds to one variable. The software excludes rows removed due to missing values from X. Data Types: double Y — Observed class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Observed class labels, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y is the same data type as the input argument Y of fitcnb. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. The software excludes elements removed due to missing values from Y. Data Types: categorical | char | logical | single | double | cell

Methods

32-412

compact

Compact naive Bayes classifier

crossval

Cross-validated naive Bayes classifier

resubEdge

Classification edge for naive Bayes classifiers by resubstitution

resubLoss

Classification loss for naive Bayes classifiers by resubstitution

resubMargin

Classification margins for naive Bayes classifiers by resubstitution

resubPredict

Predict resubstitution labels of naive Bayes classifier

ClassificationNaiveBayes class

Inherited Methods edge

Classification edge for naive Bayes classifiers

logP

Log unconditional probability density for naive Bayes classifier

loss

Classification error for naive Bayes classifier

margin

Classification margins for naive Bayes classifiers

predict

Predict labels using naive Bayes classification model

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples Train a Naive Bayes Classifier Construct a naive Bayes classifier for Fisher's iris data. Also, specify prior probabilities after training. Load Fisher's iris data. load fisheriris X = meas; Y = species;

X is a numeric matrix that contains four petal measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species. Train a naive Bayes classifier. Mdl = fitcnb(X,Y) Mdl = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

'Y' [] {'setosa' 'versicolor' 'virginica'} 'none' 150 {'normal' 'normal' 'normal' 'normal'} {3x4 cell}

Properties, Methods

Mdl is a trained ClassificationNaiveBayes classifier, and some of its properties display in the Command Window. By default, the software treats each predictor as independent, and fits them using normal distributions. To access the properties of Mdl, use dot notation. Mdl.ClassNames

32-413

32

Functions

ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } Mdl.Prior ans = 1×3 0.3333

0.3333

0.3333

Mdl.Prior contains the class prior probabilities, which are settable using the name-value pair argument 'Prior' in fitcnb. The order of the class prior probabilities corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data. You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3 respectively. Mdl.Prior = [0.5 0.2 0.3];

You can pass Mdl to e.g., predict to label new measurements, or crossval to cross validate the classifier.

More About Bag-of-Tokens Model In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in this observation. The number of categories (bins) in this multinomial model is the number of distinct tokens, that is, the number of predictors. Naive Bayes Naive Bayes is a classification algorithm that applies density estimation to the data. The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Though the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1]. Naive Bayes classifiers assign observations to the most probable class (in other words, the maximum a posteriori decision rule). Explicitly, the algorithm: 1

Estimates the densities of the predictors within each class.

2

Models posterior probabilities according to Bayes rule. That is, for all k = 1,...,K, π Y=k P Y = k X1, .., XP =



j=1 K

k=1

32-414

P



π Y=k

P Xj Y = k P



j=1

, P Xj Y = k

ClassificationNaiveBayes class

where: • Y is the random variable corresponding to the class index of an observation. • X1,...,XP are the random predictors of an observation. • π Y = k is the prior probability that a class index is k. 3

Classifies an observation by estimating the posterior probability for each class, and then assigns the observation to the class yielding the maximum posterior probability.

If the predictors compose a multinomial distribution, then the posterior probability P Y = k X1, .., XP ∝ π Y = k Pmn X1, ..., XP Y = k , where Pmn X1, ..., XP Y = k is the probability mass function of a multinomial distribution.

Algorithms • If you specify 'DistributionNames','mn' when training Mdl using fitcnb, then the software fits a multinomial distribution using the bag-of-tokens model on page 32-1656. The software stores the probability that token j appears in class k in the property DistributionParameters{k,j}. Using additive smoothing [2], the estimated probability is P(token  j class k) =

1 + cj k , P + ck

where:



• cj

k

= nk

i: yi ∈ class k



xi jwi

i: yi ∈ class k

wi

; which is the weighted number of occurrences of token j in class k.

• nk is the number of observations in class k. • wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. •

ck =

P



j=1

c j k; which is the total weighted number of occurrences of all tokens in class k.

• If you specify 'DistributionNames','mvmn' when training Mdl using fitcnb, then: 1

For each predictor, the software collects a list of the unique levels, stores the sorted list in CategoricalLevels, and considers each level a bin. Each predictor/class combination is a separate, independent multinomial random variable.

2

For predictor j in class k, the software counts instances of each categorical level using the list stored in CategoricalLevels{j}.

3

The software stores the probability that predictor j, in class k, has level L in the property DistributionParameters{k,j}, for all levels in CategoricalLevels{j}. Using additive smoothing [2], the estimated probability is P predictor  j = L class k =

1 + m j k(L) , m j + mk

where: 32-415

32

Functions



• m j k(L) = nk

i: yi ∈  class k

I xi j = L wi



i: yi ∈  class k

wi

; which is the weighted number of observations for

which predictor j equals L in class k. • nk is the number of observations in class k. • I xi j = L = 1 if xij = L, 0 otherwise. • wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. • mj is the number of distinct levels in predictor j. • mk is the weighted number of observations in class k.

References [1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008. [2] Manning, C. D., P. Raghavan, and M. Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict function supports code generation. • When you train a naive Bayes model by using fitcnb, the following restrictions apply. • The class labels input argument value (Y) cannot be a categorical array. • Code generation does not support categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the 'CategoricalPredictors' name-value pair argument. • The value of the 'DistributionNames' name-value pair argument cannot contain 'mn' or 'mvmn'. • The value of the 'ClassNames' name-value pair argument cannot be a categorical array. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For more information, see “Introduction to Code Generation” on page 31-2.

See Also CompactClassificationNaiveBayes | compareHoldout | fitcnb | loss | predict Topics “Naive Bayes Classification” on page 21-2 32-416

ClassificationNaiveBayes class

“Grouping Variables” on page 2-45

32-417

32

Functions

ClassificationPartitionedECOC Cross-validated multiclass ECOC model for support vector machines (SVMs) and other classifiers

Description ClassificationPartitionedECOC is a set of error-correcting output codes (ECOC) models trained on cross-validated folds. Estimate the quality of the cross-validated classification by using one or more “kfold” functions: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun. Every “kfold” method uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, suppose you cross-validate using five folds. In this case, the software randomly assigns each observation into five groups of equal size (roughly). The training fold contains four of the groups (roughly 4/5 of the data), and the validation fold contains the other group (roughly 1/5 of the data). In this case, cross-validation proceeds as follows: 1

The software trains the first model (stored in CVMdl.Trained{1}) by using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model (stored in CVMdl.Trained{2}) by using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by using kfoldPredict, the software computes predictions for the observations in group i by using the ith model. In short, the software estimates a response for every observation by using the model trained without that observation.

Creation You can create a ClassificationPartitionedECOC model in two ways: • Create a cross-validated ECOC model from an ECOC model by using the crossval object function. • Create a cross-validated ECOC model by using the fitcecoc function and specifying one of the name-value pair arguments 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

Properties Cross-Validation Properties CrossValidatedModel — Cross-validated model name character vector Cross-validated model name, specified as a character vector. For example, 'ECOC' specifies a cross-validated ECOC model. 32-418

ClassificationPartitionedECOC

Data Types: char KFold — Number of cross-validated folds positive integer Number of cross-validated folds, specified as a positive integer. Data Types: double ModelParameters — Cross-validation parameter values object Cross-validation parameter values, specified as an object. The parameter values correspond to the name-value pair argument values used to cross-validate the ECOC classifier. ModelParameters does not contain estimated parameters. You can access the properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Compact classifiers trained on cross-validation folds cell array of CompactClassificationECOC models Compact classifiers trained on cross-validation folds, specified as a cell array of CompactClassificationECOC models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. The software normalizes the weights used for training so that nansum(W) is 1. Data Types: single | double X — Unstandardized predictor data numeric matrix | table Unstandardized predictor data used to cross-validate the classifier, specified as a numeric matrix or table. Each row of X corresponds to one observation, and each column corresponds to one variable. Data Types: single | double | table 32-419

32

Functions

Y — Observed class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements and has the same data type as the input argument Y that you pass to fitcecoc to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. Data Types: categorical | char | logical | single | double | cell ECOC Properties BinaryLoss — Binary learner loss function 'binodeviance' | 'exponential' | 'hamming' | 'hinge' | 'linear' | 'logit' | 'quadratic' Binary learner loss function, specified as a character vector representing the loss function name. If you train using binary learners that use different loss functions, then the software sets BinaryLoss to 'hamming'. To potentially increase accuracy, specify a binary loss function other than the default during a prediction or loss computation by using the 'BinaryLoss' name-value pair argument of kfoldPredict or kfoldLoss. Data Types: char BinaryY — Binary learner class labels numeric matrix | [] Binary learner class labels, specified as a numeric matrix or []. • If the coding matrix is the same across all folds, then BinaryY is a NumObservations-by-L matrix, where L is the number of binary learners (size(CodingMatrix,2)). The elements of BinaryY are –1, 0, or 1, and the values correspond to dichotomous class assignments. This table describes how learner j assigns observation k to a dichotomous class corresponding to the value of BinaryY(k,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observation k to a negative class.

0

Before training, learner j removes observation k from the data set.

1

Learner j assigns observation k to a positive class.

• If the coding matrix varies across folds, then BinaryY is empty ([]). Data Types: double CodingMatrix — Codes specifying class assignments numeric matrix | [] Codes specifying class assignments for the binary learners, specified as a numeric matrix or []. 32-420

ClassificationPartitionedECOC

• If the coding matrix is the same across all folds, then CodingMatrix is a K-by-L matrix, where K is the number of classes and L is the number of binary learners. The elements of CodingMatrix are –1, 0, or 1, and the values correspond to dichotomous class assignments. This table describes how learner j assigns observations in class i to a dichotomous class corresponding to the value of CodingMatrix(i,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observations in class i to a negative class.

0

Before training, learner j removes observations in class i from the data set.

1

Learner j assigns observations in class i to a positive class.

• If the coding matrix varies across folds, then CodingMatrix is empty ([]). You can obtain the coding matrix for each fold by using the Trained property. For example, CVMdl.Trained{1}.CodingMatrix is the coding matrix in the first fold of the cross-validated ECOC model CVMdl. Data Types: double | single | int8 | int16 | int32 | int64 Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. fitcecoc incorporates misclassification costs differently among different types of binary learners. 32-421

32

Functions

Data Types: double PredictorNames — Predictor names cell array of character vectors Predictor names in order of their appearance in the predictor data X, specified as a cell array of character vectors. The length of PredictorNames is equal to the number of columns in X. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as the number of classes in ClassNames, and the order of the elements corresponds to the order of the classes in ClassNames. fitcecoc incorporates misclassification costs differently among different types of binary learners. Data Types: double ResponseName — Response variable name character vector Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation function to apply to predicted scores 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Score transformation function to apply to predicted scores, specified as a function name or function handle. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter this code and replace function with a value in the table. Mdl.ScoreTransform = 'function';

32-422

Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

ClassificationPartitionedECOC

Value

Description

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Data Types: char | function_handle

Object Functions kfoldEdge kfoldLoss kfoldMargin kfoldPredict kfoldfun

Classification edge for cross-validated ECOC model Classification loss for cross-validated ECOC model Classification margins for cross-validated ECOC model Classify observations in cross-validated ECOC model Cross-validate function using cross-validated ECOC model

Examples Cross-Validate ECOC Classifier Cross-validate an ECOC classifier with SVM binary learners, and estimate the generalized classification error. Load Fisher's iris data set. Specify the predictor data X and the response data Y. load fisheriris X = meas; Y = species; rng(1); % For reproducibility

Create an SVM template, and standardize the predictors. t = templateSVM('Standardize',true) t = Fit template for classification SVM. Alpha: BoxConstraint: CacheSize: CachingMethod: ClipAlphas: DeltaGradientTolerance: Epsilon: GapTolerance: KKTTolerance: IterationLimit: KernelFunction: KernelScale:

[0x1 double] [] [] '' [] [] [] [] [] [] '' []

32-423

32

Functions

KernelOffset: KernelPolynomialOrder: NumPrint: Nu: OutlierFraction: RemoveDuplicates: ShrinkagePeriod: Solver: StandardizeData: SaveSupportVectors: VerbosityLevel: Version: Method: Type:

[] [] [] [] [] [] [] '' 1 [] [] 2 'SVM' 'classification'

t is an SVM template. Most of the template object properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values. Train the ECOC classifier, and specify the class order. Mdl = fitcecoc(X,Y,'Learners',t,... 'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationECOC classifier. You can access its properties using dot notation. Cross-validate Mdl using 10-fold cross-validation. CVMdl = crossval(Mdl);

CVMdl is a ClassificationPartitionedECOC cross-validated ECOC classifier. Estimate the generalized classification error. genError = kfoldLoss(CVMdl) genError = 0.0400

The generalized classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.

Speed Up Training ECOC Classifiers Using Binning and Parallel Computing Train a one-versus-all ECOC classifier using a GentleBoost ensemble of decision trees with surrogate splits. To speed up training, bin numeric predictors and use parallel computing. Binning is valid only when fitcecoc uses a tree learner. After training, estimate the classification error using 10-fold cross-validation. Note that parallel computing requires Parallel Computing Toolbox™. Load Sample Data Load and inspect the arrhythmia data set. load arrhythmia [n,p] = size(X) n = 452

32-424

ClassificationPartitionedECOC

p = 279 isLabels = unique(Y); nLabels = numel(isLabels) nLabels = 13 tabulate(categorical(Y)) Value 1 2 3 4 5 6 7 8 9 10 14 15 16

Count 245 44 15 15 13 25 3 2 9 50 4 5 22

Percent 54.20% 9.73% 3.32% 3.32% 2.88% 5.53% 0.66% 0.44% 1.99% 11.06% 0.88% 1.11% 4.87%

The data set contains 279 predictors, and the sample size of 452 is relatively small. Of the 16 distinct labels, only 13 are represented in the response (Y). Each label describes various degrees of arrhythmia, and 54.20% of the observations are in class 1. Train One-Versus-All ECOC Classifier Create an ensemble template. You must specify at least three arguments: a method, a number of learners, and the type of learner. For this example, specify 'GentleBoost' for the method, 100 for the number of learners, and a decision tree template that uses surrogate splits because there are missing observations. tTree = templateTree('surrogate','on'); tEnsemble = templateEnsemble('GentleBoost',100,tTree);

tEnsemble is a template object. Most of its properties are empty, but the software fills them with their default values during training. Train a one-versus-all ECOC classifier using the ensembles of decision trees as binary learners. To speed up training, use binning and parallel computing. • Binning ('NumBins',50) — When you have a large training data set, you can speed up training (a potential decrease in accuracy) by using the 'NumBins' name-value pair argument. This argument is valid only when fitcecoc uses a tree learner. If you specify the 'NumBins' value, then the software bins every numeric predictor into a specified number of equiprobable bins, and then grows trees on the bin indices instead of the original data. You can try 'NumBins',50 first, and then change the 'NumBins' value depending on the accuracy and training speed. • Parallel computing ('Options',statset('UseParallel',true)) — With a Parallel Computing Toolbox license, you can speed up the computation by using parallel computing, which sends each binary learner to a worker in the pool. The number of workers depends on your system configuration. When you use decision trees for binary learners, fitcecoc parallelizes training using Intel® Threading Building Blocks (TBB) for dual-core systems and above. Therefore, specifying the 'UseParallel' option is not helpful on a single computer. Use this option on a cluster. 32-425

32

Functions

Additionally, specify that the prior probabilities are 1/K, where K = 13 is the number of distinct classes. options = statset('UseParallel',true); Mdl = fitcecoc(X,Y,'Coding','onevsall','Learners',tEnsemble,... 'Prior','uniform','NumBins',50,'Options',options); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).

Mdl is a ClassificationECOC model. Cross-Validation Cross-validate the ECOC classifier using 10-fold cross-validation. CVMdl = crossval(Mdl,'Options',options); Warning: One or more folds do not contain points from all the groups.

CVMdl is a ClassificationPartitionedECOC model. The warning indicates that some classes are not represented while the software trains at least one fold. Therefore, those folds cannot predict labels for the missing classes. You can inspect the results of a fold using cell indexing and dot notation. For example, access the results of the first fold by entering CVMdl.Trained{1}. Use the cross-validated ECOC classifier to predict validation-fold labels. You can compute the confusion matrix by using confusionchart. Move and resize the chart by changing the inner position property to ensure that the percentages appear in the row summary. oofLabel = kfoldPredict(CVMdl,'Options',options); ConfMat = confusionchart(Y,oofLabel,'RowSummary','total-normalized'); ConfMat.InnerPosition = [0.10 0.12 0.85 0.85];

32-426

ClassificationPartitionedECOC

Reproduce Binned Data Reproduce binned predictor data by using the BinEdges property of the trained model and the discretize function. X = Mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = Mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

32-427

32

Functions

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

See Also ClassificationECOC | CompactClassificationECOC | crossval | cvpartiton | fitcecoc Introduced in R2014b

32-428

ClassificationPartitionedEnsemble

ClassificationPartitionedEnsemble Package: classreg.learning.partition Superclasses: ClassificationPartitionedModel Cross-validated classification ensemble

Description ClassificationPartitionedEnsemble is a set of classification ensembles trained on crossvalidated folds. Estimate the quality of classification by cross validation using one or more “kfold” methods: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun. Every “kfold” method uses models trained on in-fold observations to predict response for out-of-fold observations. For example, suppose you cross validate using five folds. In this case, every training fold contains roughly 4/5 of the data and every test fold contains roughly 1/5 of the data. The first model stored in Trained{1} was trained on X and Y with the first 1/5 excluded, the second model stored in Trained{2} was trained on X and Y with the second 1/5 excluded, and so on. When you call kfoldPredict, it computes predictions for the first 1/5 of the data using the first model, for the second 1/5 of data using the second model, and so on. In short, response for every observation is computed by kfoldPredict using the model trained without this observation.

Construction cvens = crossval(ens) creates a cross-validated ensemble from ens, a classification ensemble. For syntax details, see the crossval method reference page. cvens = fitcensemble(X,Y,Name,Value) creates a cross-validated ensemble when Name is one of 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. For syntax details, see the fitcensemble function reference page.

Properties BinEdges Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the 'NumBins' name-value pair argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default). You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl. X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors.

32-429

32

Functions

idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs. CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). ClassNames List of the elements in Y with duplicates removed. ClassNames can be a numeric vector, vector of categorical variables, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) Combiner Cell array of combiners across all folds. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is readonly. CrossValidatedModel Name of the cross-validated model, a character vector. Kfold Number of folds used in a cross-validated ensemble, a positive integer. ModelParameters Object holding parameters of cvens. 32-430

ClassificationPartitionedEnsemble

NumObservations Number of data points used in training the ensemble, a positive integer. NTrainedPerFold Number of data points used in training each fold of the ensemble, a positive integer. Partition Partition of class cvpartition used in creating the cross-validated ensemble. PredictorNames Cell array of names for the predictor variables, in the order in which they appear in X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only. ResponseName Name of the response variable Y, a character vector. ScoreTransform Function handle for transforming scores, or character vector representing a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree. Add or change a ScoreTransform function using dot notation: ens.ScoreTransform = 'function'

or ens.ScoreTransform = @function

Trainable Cell array of ensembles trained on cross-validation folds. Every ensemble is full, meaning it contains its training data and weights. Trained Cell array of compact ensembles trained on cross-validation folds. W Scaled weights, a vector with length n, the number of rows in X. X A matrix or table of predictor values. Each column of X represents one variable, and each row represents one observation. 32-431

32

Functions

Y Numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. Each row of Y is the response to the data in the corresponding row of X.

Methods kfoldEdge

Classification edge for observations not used for training

kfoldLoss

Classification loss for observations not used for training

resume

Resume training learners on cross-validation folds

Inherited Methods kfoldEdge

Classification edge for observations not used for training

kfoldfun

Cross validate function

kfoldLoss

Classification loss for observations not used for training

kfoldMargin

Classification margins for observations not used for training

kfoldPredict

Predict response for observations not used for training

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples Evaluate K-Fold Cross-Validation Error for Classification Ensemble Evaluate the k-fold cross-validation error for a classification ensemble that models the Fisher iris data. Load the sample data set. load fisheriris

Train an ensemble of 100 boosted classification trees using AdaBoostM2. t = templateTree('MaxNumSplits',1); % Weak learner template tree object ens = fitcensemble(meas,species,'Method','AdaBoostM2','Learners',t);

Create a cross-validated ensemble from ens and find the k-fold cross-validation error. rng(10,'twister') % For reproducibility cvens = crossval(ens); L = kfoldLoss(cvens) L = 0.0467

32-432

ClassificationPartitionedEnsemble

See Also ClassificationEnsemble | ClassificationPartitionedModel | RegressionPartitionedEnsemble | fitctree

32-433

32

Functions

ClassificationPartitionedKernel Cross-validated, binary kernel classification model

Description ClassificationPartitionedKernel is a binary kernel classification model, trained on crossvalidated folds. You can estimate the quality of classification, or how well the kernel classification model generalizes, using one or more “kfold” functions: kfoldPredict, kfoldLoss, kfoldMargin, and kfoldEdge. Every “kfold” method uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five groups of equal size (roughly). The training fold contains four of the groups (that is, roughly 4/5 of the data) and the validation fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows: 1

The software trains the first model (stored in CVMdl.Trained{1}) by using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by using kfoldPredict, the software computes predictions for the observations in group i by using the ith model. In short, the software estimates a response for every observation by using the model trained without that observation. Note ClassificationPartitionedKernel model objects do not store the predictor data set.

Creation You can create a ClassificationPartitionedKernel model by training a classification kernel model using fitckernel and specifying one of these name-value pair arguments: 'Crossval', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

Properties Cross-Validation Properties CrossValidatedModel — Cross-validated model name character vector This property is read-only. Cross-validated model name, specified as a character vector. 32-434

ClassificationPartitionedKernel

For example, 'Kernel' specifies a cross-validated kernel model. Data Types: char KFold — Number of cross-validated folds positive integer scalar This property is read-only. Number of cross-validated folds, specified as a positive integer scalar. Data Types: double ModelParameters — Cross-validation parameter values object This property is read-only. Cross-validation parameter values, specified as an object. The parameter values correspond to the name-value pair argument values used to cross-validate the kernel classifier. ModelParameters does not contain estimated parameters. You can access the properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar This property is read-only. Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model This property is read-only. Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Kernel classifiers trained on cross-validation folds cell array of ClassificationKernel models This property is read-only. Kernel classifiers trained on cross-validation folds, specified as a cell array of ClassificationKernel models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector This property is read-only. Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. 32-435

32

Functions

The software normalizes the weights used for training so that nansum(W) is 1. Data Types: single | double Y — Observed class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements and has the same data type as the input argument Y that you pass to fitckernel to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. Data Types: categorical | char | logical | single | double | cell Other Classification Properties CategoricalPredictors — Categorical predictor indices [] This property is read-only. Categorical predictor indices, specified as an empty numeric value. In general, CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. Because ClassificationKernel models can be trained on numeric predictor data only, this property is empty ([]). ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the observed class labels property Y and determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double PredictorNames — Predictor names {'x1','x2',...} 32-436

ClassificationPartitionedKernel

This property is read-only. Predictor names in order of their appearance in the predictor data X, specified as a cell array of character vectors of the form {'x1','x2',...}. The length of PredictorNames is equal to the number of columns in X. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as there are classes in ClassNames, and the order of the elements corresponds to the elements of ClassNames. Data Types: double ResponseName — Response variable name 'Y' This property is read-only. Response variable name, specified as 'Y'. Because ClassificationKernel models cannot be trained on tabular data, this property is always 'Y'. Data Types: char ScoreTransform — Score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Score transformation function to apply to predicted scores, specified as a function name or function handle. For a kernel classification model Mdl, and before the score transformation, the predicted classification score for the observation x (row vector) is f x = T(x)β + b . • T · is a transformation of an observation for feature expansion. • β is the estimated column vector of coefficients. • b is the estimated scalar bias. To change the CVMdl score transformation function to function, for example, use dot notation. • For a built-in function, enter this code and replace function with a value from the table. CVMdl.ScoreTransform = 'function';

Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

32-437

32

Functions

Value

Description

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. CVMdl.ScoreTransform = @function;

function must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class. Data Types: char | function_handle

Object Functions kfoldEdge kfoldLoss kfoldMargin kfoldPredict

Classification edge for cross-validated kernel classification model Classification loss for cross-validated kernel classification model Classification margins for cross-validated kernel classification model Classify observations in cross-validated kernel classification model

Examples Cross-Validate Kernel Classification Model Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). load ionosphere rng('default') % For reproducibility

Cross-validate a binary kernel classification model. By default, the software uses 10-fold crossvalidation. CVMdl = fitckernel(X,Y,'CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernel CrossValidatedModel: 'Kernel' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

32-438

ClassificationPartitionedKernel

Properties, Methods numel(CVMdl.Trained) ans = 10

CVMdl is a ClassificationPartitionedKernel model. Because fitckernel implements 10-fold cross-validation, CVMdl contains 10 ClassificationKernel models that the software trains on training-fold (in-fold) observations. Estimate the cross-validated classification error. kfoldLoss(CVMdl) ans = 0.0940

The classification error rate is approximately 9%.

See Also ClassificationKernel | fitckernel Introduced in R2018b

32-439

32

Functions

ClassificationPartitionedKernelECOC Cross-validated kernel error-correcting output codes (ECOC) model for multiclass classification

Description ClassificationPartitionedKernelECOC is an error-correcting output codes (ECOC) model composed of kernel classification models, trained on cross-validated folds. Estimate the quality of the classification by cross-validation using one or more “kfold” functions: kfoldPredict, kfoldLoss, kfoldMargin, and kfoldEdge. Every “kfold” method uses models trained on training-fold (in-fold) observations to predict the response for validation-fold (out-of-fold) observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five groups of equal size (roughly). The training fold contains four of the groups (that is, roughly 4/5 of the data) and the validation fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows: 1

The software trains the first model (stored in CVMdl.Trained{1}) by using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by using kfoldPredict, the software computes predictions for the observations in group i by using the ith model. In short, the software estimates a response for every observation by using the model trained without that observation. Note ClassificationPartitionedKernelECOC model objects do not store the predictor data set.

Creation You can create a ClassificationPartitionedKernelECOC model by training an ECOC model using fitcecoc and specifying these name-value pair arguments: • 'Learners'– Set the value to 'kernel', a template object returned by templateKernel, or a cell array of such template objects. • One of the arguments 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. For more details, see fitcecoc.

32-440

ClassificationPartitionedKernelECOC

Properties Cross-Validation Properties CrossValidatedModel — Cross-validated model name character vector This property is read-only. Cross-validated model name, specified as a character vector. For example, 'KernelECOC' specifies a cross-validated kernel ECOC model. Data Types: char KFold — Number of cross-validated folds positive integer scalar This property is read-only. Number of cross-validated folds, specified as a positive integer scalar. Data Types: double ModelParameters — Cross-validation parameter values object This property is read-only. Cross-validation parameter values, specified as an object. The parameter values correspond to the name-value pair argument values used to cross-validate the ECOC classifier. ModelParameters does not contain estimated parameters. You can access the properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar This property is read-only. Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model This property is read-only. Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Compact classifiers trained on cross-validation folds cell array of CompactClassificationECOC models This property is read-only. 32-441

32

Functions

Compact classifiers trained on cross-validation folds, specified as a cell array of CompactClassificationECOC models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector This property is read-only. Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. The software normalizes the weights used for training so that nansum(W) is 1. Data Types: single | double Y — Observed class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements and has the same data type as the input argument Y that you pass to fitcecoc to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. Data Types: categorical | char | logical | single | double | cell ECOC Properties BinaryLoss — Binary learner loss function 'binodeviance' | 'exponential' | 'hamming' | 'hinge' | 'linear' | 'logit' | 'quadratic' This property is read-only. Binary learner loss function, specified as a character vector representing the loss function name. By default, if all binary learners are kernel classification models using SVM, then BinaryLoss is 'hinge'. If all binary learners are kernel classification models using logistic regression, then BinaryLoss is 'quadratic'. To potentially increase accuracy, specify a binary loss function other than the default during a prediction or loss computation by using the 'BinaryLoss' name-value pair argument of kfoldPredict or kfoldLoss. Data Types: char BinaryY — Binary learner class labels numeric matrix | [] This property is read-only. Binary learner class labels, specified as a numeric matrix or []. • If the coding matrix is the same across all folds, then BinaryY is a NumObservations-by-L matrix, where L is the number of binary learners (size(CodingMatrix,2)). 32-442

ClassificationPartitionedKernelECOC

The elements of BinaryY are –1, 0, or 1, and the values correspond to dichotomous class assignments. This table describes how learner j assigns observation k to a dichotomous class corresponding to the value of BinaryY(k,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observation k to a negative class.

0

Before training, learner j removes observation k from the data set.

1

Learner j assigns observation k to a positive class.

• If the coding matrix varies across folds, then BinaryY is empty ([]). Data Types: double CodingMatrix — Codes specifying class assignments numeric matrix | [] This property is read-only. Codes specifying class assignments for the binary learners, specified as a numeric matrix or []. • If the coding matrix is the same across all folds, then CodingMatrix is a K-by-L matrix, where K is the number of classes and L is the number of binary learners. The elements of CodingMatrix are –1, 0, or 1, and the values correspond to dichotomous class assignments. This table describes how learner j assigns observations in class i to a dichotomous class corresponding to the value of CodingMatrix(i,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observations in class i to a negative class.

0

Before training, learner j removes observations in class i from the data set.

1

Learner j assigns observations in class i to a positive class.

• If the coding matrix varies across folds, then CodingMatrix is empty ([]). You can obtain the coding matrix for each fold by using the Trained property. For example, CVMdl.Trained{1}.CodingMatrix is the coding matrix in the first fold of the cross-validated ECOC model CVMdl. Data Types: double | single | int8 | int16 | int32 | int64 Other Classification Properties CategoricalPredictors — Categorical predictor indices [] This property is read-only. Categorical predictor indices, specified as an empty numeric value. In general, CategoricalPredictors contains index values corresponding to the columns of the predictor data 32-443

32

Functions

that contain categorical predictors. Because ClassificationKernel models can be trained on numeric predictor data only, this property is empty ([]). ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the observed class labels property Y and determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double PredictorNames — Predictor names {'x1','x2',...} This property is read-only. Predictor names in order of their appearance in the predictor data X, specified as a cell array of character vectors of the form {'x1','x2',...}. The length of PredictorNames is equal to the number of columns in X. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as there are classes in ClassNames, and the order of the elements corresponds to the elements of ClassNames. Data Types: double ResponseName — Response variable name 'Y' This property is read-only. Response variable name, specified as 'Y'. Because ClassificationKernel models cannot be trained on tabular data, this property is always 'Y'. Data Types: char 32-444

ClassificationPartitionedKernelECOC

ScoreTransform — Score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Score transformation function to apply to predicted scores, specified as a function name or function handle. For a kernel classification model Mdl, and before the score transformation, the predicted classification score for the observation x (row vector) is f x = T(x)β + b . • T · is a transformation of an observation for feature expansion. • β is the estimated column vector of coefficients. • b is the estimated scalar bias. To change the CVMdl score transformation function to function, for example, use dot notation. • For a built-in function, enter this code and replace function with a value from the table. CVMdl.ScoreTransform = 'function';

Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. CVMdl.ScoreTransform = @function;

function must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class. Data Types: char | function_handle

Object Functions kfoldEdge kfoldLoss kfoldMargin kfoldPredict

Classification edge for cross-validated kernel ECOC model Classification loss for cross-validated kernel ECOC model Classification margins for cross-validated kernel ECOC model Classify observations in cross-validated kernel ECOC model

32-445

32

Functions

Examples Cross-Validate Multiclass Kernel Classification Model Create a cross-validated, multiclass kernel ECOC classification model using fitcecoc. Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Cross-validate a multiclass kernel ECOC classification model that can identify the species of a flower based on the flower's measurements. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedKernelECOC cross-validated model. fitcecoc implements 10-fold cross-validation by default. Therefore, CVMdl.Trained contains a 10-by-1 cell array of ten CompactClassificationECOC models, one for each fold. Each compact ECOC model is composed of binary kernel classification models. Estimate the classification error by passing CVMdl to kfoldLoss. error = kfoldLoss(CVMdl) error = 0.0333

The estimated classification error is about 3% misclassified observations. To change default options when training ECOC models composed of kernel classification models, create a kernel classification model template using templateKernel, and then pass the template to fitcecoc.

See Also ClassificationKernel | CompactClassificationECOC | fitcecoc | fitckernel 32-446

ClassificationPartitionedKernelECOC

Introduced in R2018b

32-447

32

Functions

ClassificationPartitionedLinear Package: classreg.learning.partition Superclasses: ClassificationPartitionedModel Cross-validated linear model for binary classification of high-dimensional data

Description ClassificationPartitionedLinear is a set of linear classification models trained on crossvalidated folds. To obtain a cross-validated, linear classification model, use fitclinear and specify one of the cross-validation options. You can estimate the quality of classification, or how well the linear classification model generalizes, using one or more of these “kfold” methods: kfoldPredict, kfoldLoss, kfoldMargin, and kfoldEdge. Every “kfold” method uses models trained on in-fold observations to predict the response for out-offold observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five roughly equally sized groups. The training fold contains four of the groups (that is, roughly 4/5 of the data) and the test fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows: 1

The software trains the first model (stored in CVMdl.Trained{1}) using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model, which is stored in CVMdl.Trained{2}, using the observations in the first group and last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third through fifth models.

If you validate by calling kfoldPredict, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation. Note ClassificationPartitionedLinear model objects do not store the predictor data set.

Construction CVMdl = fitclinear(X,Y,Name,Value) creates a cross-validated, linear classification model when Name is either 'CrossVal', 'CVPartition', 'Holdout', or 'KFold'. For more details, see fitclinear.

Properties Cross-Validation Properties

CrossValidatedModel — Cross-validated model name character vector Cross-validated model name, specified as a character vector. 32-448

ClassificationPartitionedLinear

For example, 'Linear' specifies a cross-validated linear model for binary classification or regression. Data Types: char KFold — Number of cross-validated folds positive integer Number of cross-validated folds, specified as a positive integer. Data Types: double ModelParameters — Cross-validation parameter values object Cross-validation parameter values, e.g., the name-value pair argument values used to cross-validate the linear model, specified as an object. ModelParameters does not contain estimated parameters. Access properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Linear classification models trained on cross-validation folds cell array of ClassificationLinear model objects Linear classification models trained on cross-validation folds, specified as a cell array of ClassificationLinear models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. The software normalizes W so that the weights for observations within a particular class sum up to the prior probability of that class. Data Types: single | double Y — Observed class labels categorical array | character array | logical vector | vector of numeric values | cell array of character vectors Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements, and 32-449

32

Functions

is the same data type as the input argument Y that you passed to fitclinear to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding observation in the predictor data. Data Types: categorical | char | logical | single | double | cell Other Classification Properties

CategoricalPredictors — Categorical predictor indices vector of positive integers | [] Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. Data Types: double PredictorNames — Predictor names cell array of character vectors Predictor names in their order of appearance in predictor data used to train CVMdl, specified as a cell array of character vectors. PredictorNames has length equal to the number of predictor variables in the predictor data. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. 32-450

ClassificationPartitionedLinear

Prior class probabilities, specified as a numeric vector. Prior has as many elements as the number of classes in ClassNames, and the order of the elements corresponds to the order of the classes in ClassNames. fitcecoc incorporates misclassification costs differently among different types of binary learners. Data Types: double ResponseName — Response variable name character vector Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Score transformation function to apply to predicted scores, specified as a function name or function handle. For linear classification models and before transformation, the predicted classification score for the observation x (row vector) is f(x) = xβ + b, where β and b correspond to Mdl.Beta and Mdl.Bias, respectively. To change the score transformation function to, for example, function, use dot notation. • For a built-in function, enter this code and replace function with a value in the table. Mdl.ScoreTransform = 'function';

Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function, or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class. 32-451

32

Functions

Data Types: char | function_handle

Methods kfoldEdge

Classification edge for observations not used for training

kfoldLoss

Classification loss for observations not used in training

kfoldMargin

Classification margins for observations not used in training

kfoldPredict

Predict labels for observations not used for training

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples Create Cross-Validated Binary Linear Classification Model Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Cross-validate a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 10 Partition: [1x1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedLinear cross-validated model. Because fitclinear implements 10-fold cross-validation by default, CVMdl.Trained contains ten ClassificationLinear models that contain the results of training linear classification models for each of the folds. 32-452

ClassificationPartitionedLinear

Estimate labels for out-of-fold observations and estimate the generalization error by passing CVMdl to kfoldPredict and kfoldLoss, respectively. oofLabels = kfoldPredict(CVMdl); ge = kfoldLoss(CVMdl) ge = 7.6017e-04

The estimated generalization error is less than 0.1% misclassified observations.

Find Good Lasso Penalty Using Cross-Validation To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, implement 5-fold cross-validation. Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats'; −6

Create a set of 11 logarithmically-spaced regularization strengths from 10

−0 . 5

through 10

.

Lambda = logspace(-6,-0.5,11);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Estimate the coefficients using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8. X = X'; rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns','KFold',5,... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8) CVMdl = classreg.learning.partition.ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1×1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods

32-453

32

Functions

numCLModels = numel(CVMdl.Trained) numCLModels = 5

CVMdl is a ClassificationPartitionedLinear model. Because fitclinear implements 5-fold cross-validation, CVMdl contains 5 ClassificationLinear models that the software trains on each fold. Display the first trained linear classification model. Mdl1 = CVMdl.Trained{1}

Mdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'logit' Beta: [34023×11 double] Bias: [-13.2904 -13.2904 -13.2904 -13.2904 -9.9357 -7.0782 -5.4335 -4.5473 -3.4223 Lambda: [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 1.5849e-04 5.6234e-04 0.0020 0.0 Learner: 'logistic' Properties, Methods

Mdl1 is a ClassificationLinear model object. fitclinear constructed Mdl1 by training on the first four folds. Because Lambda is a sequence of regularization strengths, you can think of Mdl1 as 11 models, one for each regularization strength in Lambda. Estimate the cross-validated classification error. ce = kfoldLoss(CVMdl);

Because there are 11 regularization strengths, ce is a 1-by-11 vector of classification error rates. Higher values of Lambda lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model. Mdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the cross-validated, classification error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale. figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(ce),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} classification error') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda')

32-454

ClassificationPartitionedLinear

title('Test-Sample Statistics') hold off

Choose the index of the regularization strength that balances predictor variable sparsity and low −4

classification error. In this case, a value between 10

−1

to 10

should suffice.

idxFinal = 7;

Select the model from Mdl with the chosen regularization strength. MdlFinal = selectModels(Mdl,idxFinal);

MdlFinal is a ClassificationLinear model containing one regularization strength. To estimate labels for new observations, pass MdlFinal and the new data to predict.

See Also ClassificationLinear | fitclinear | kfoldLoss | kfoldPredict Introduced in R2016a

32-455

32

Functions

ClassificationPartitionedLinearECOC Package: classreg.learning.partition Superclasses: ClassificationPartitionedModel Cross-validated linear error-correcting output codes model for multiclass classification of highdimensional data

Description ClassificationPartitionedLinearECOC is a set of error-correcting output codes (ECOC) models composed of linear classification models, trained on cross-validated folds. Estimate the quality of classification by cross-validation using one or more “kfold” functions: kfoldPredict, kfoldLoss, kfoldMargin, and kfoldEdge. Every “kfold” method uses models trained on in-fold observations to predict the response for out-offold observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five roughly equal-sized groups. The training fold contains four of the groups (that is, roughly 4/5 of the data) and the test fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows. 1

The software trains the first model (stored in CVMdl.Trained{1}) using the observations in the last four groups and reserves the observations in the first group for validation.

2

The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and last three groups. The software reserves the observations in the second group for validation.

3

The software proceeds in a similar fashion for the third, fourth, and fifth models.

If you validate by calling kfoldPredict, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation. Note ClassificationPartitionedLinearECOC model objects do not store the predictor data set.

Construction CVMdl = fitcecoc(X,Y,'Learners',t,Name,Value) returns a cross-validated, linear ECOC model when: • t is 'Linear' or a template object returned by templateLinear. • Name is one of 'CrossVal', 'CVPartition', 'Holdout', or 'KFold'. For more details, see fitcecoc.

32-456

ClassificationPartitionedLinearECOC

Properties Cross-Validation Properties

CrossValidatedModel — Cross-validated model name character vector Cross-validated model name, specified as a character vector. For example, 'ECOC' specifies a cross-validated ECOC model. Data Types: char KFold — Number of cross-validated folds positive integer Number of cross-validated folds, specified as a positive integer. Data Types: double ModelParameters — Cross-validation parameter values object Cross-validation parameter values, e.g., the name-value pair argument values used to cross-validate the ECOC classifier, specified as an object. ModelParameters does not contain estimated parameters. Access properties of ModelParameters using dot notation. NumObservations — Number of observations positive numeric scalar Number of observations in the training data, specified as a positive numeric scalar. Data Types: double Partition — Data partition cvpartition model Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model. Trained — Compact classifiers trained on cross-validation folds cell array of CompactClassificationECOC models Compact classifiers trained on cross-validation folds, specified as a cell array of CompactClassificationECOC models. Trained has k cells, where k is the number of folds. Data Types: cell W — Observation weights numeric vector Observation weights used to cross-validate the model, specified as a numeric vector. W has NumObservations elements. The software normalizes the weights used for training so that nansum(W) is 1. 32-457

32

Functions

Data Types: single | double Y — Observed class labels categorical array | character array | logical vector | vector of numeric values | cell array of character vectors Observed class labels used to cross-validate the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y has NumObservations elements, and is the same data type as the input argument Y that you passed to fitcecoc to cross-validate the model. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the observation in the predictor data. Data Types: char | cell | categorical | logical | single | double ECOC Properties

BinaryLoss — Binary learner loss function 'binodeviance' | 'exponential' | 'hamming' | 'hinge' | 'linear' | 'logit' | 'quadratic' Binary learner loss function, specified as a character vector representing the loss function name. If you train using binary learners that use different loss functions, then the software sets BinaryLoss to 'hamming'. To potentially increase accuracy, specify a binary loss function other than the default during a prediction or loss computation by using the 'BinaryLoss' name-value pair argument of predict or loss. Data Types: char BinaryY — Binary learner class labels numeric matrix | [] Binary learner class labels, specified as a numeric matrix or []. • If the coding matrix is the same across folds, then BinaryY is a NumObservations-by-L matrix, where L is the number of binary learners (size(CodingMatrix,2)). Elements of BinaryY are -1, 0, or 1, and the value corresponds to a dichotomous class assignment. This table describes how learner j assigns observation k to a dichotomous class corresponding to the value of BinaryY(k,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observation k to a negative class.

0

Before training, learner j removes observation k from the data set.

1

Learner j assigns observation k to a positive class.

• If the coding matrix varies across folds, then BinaryY is empty ([]). Data Types: double CodingMatrix — Codes specifying class assignments numeric matrix | [] 32-458

ClassificationPartitionedLinearECOC

Codes specifying class assignments for the binary learners, specified as a numeric matrix or []. • If the coding matrix is the same across folds, then CodingMatrix is a K-by-L matrix. K is the number of classes and L is the number of binary learners. Elements of CodingMatrix are -1, 0, or 1, and the value corresponds to a dichotomous class assignment. This table describes how learner j assigns observations in class i to a dichotomous class corresponding to the value of CodingMatrix(i,j). Value

Dichotomous Class Assignment

–1

Learner j assigns observations in class i to a negative class.

0

Before training, learner j removes observations in class i from the data set.

1

Learner j assigns observations in class i to a positive class.

• If the coding matrix varies across folds, then CodingMatrix is empty ([]). Obtain the coding matrix for each fold using the Trained property. For example, CVMdl.Trained{1}.CodingMatrix is the coding matrix in the first fold of the cross-validated ECOC model CVMdl. Data Types: double | single | int8 | int16 | int32 | int64 Other Classification Properties

CategoricalPredictors — Categorical predictor indices vector of positive integers | [] Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Unique class labels used in training, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as the class labels Y. (The software treats string arrays as cell arrays of character vectors.) ClassNames also determines the class order. Data Types: categorical | char | logical | single | double | cell Cost — Misclassification costs square numeric matrix This property is read-only. Misclassification costs, specified as a square numeric matrix. Cost has K rows and columns, where K is the number of classes. Cost(i,j) is the cost of classifying a point into class j if its true class is i. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. 32-459

32

Functions

fitcecoc incorporates misclassification costs differently among different types of binary learners. Data Types: double PredictorNames — Predictor names cell array of character vectors Predictor names in their order of appearance in predictor data used to train CVMdl, specified as a cell array of character vectors. PredictorNames has length equal to the number of predictor variables in the predictor data. Data Types: cell Prior — Prior class probabilities numeric vector This property is read-only. Prior class probabilities, specified as a numeric vector. Prior has as many elements as the number of classes in ClassNames, and the order of the elements corresponds to the order of the classes in ClassNames. fitcecoc incorporates misclassification costs differently among different types of binary learners. Data Types: double ResponseName — Response variable name character vector Response variable name, specified as a character vector. Data Types: char ScoreTransform — Score transformation function 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | 'none' | function handle | ... Score transformation function to apply to predicted scores, specified as a function name or function handle. For linear classification models and before transformation, the predicted classification score for the observation x (row vector) is f(x) = xβ + b, where β and b correspond to Mdl.Beta and Mdl.Bias, respectively. To change the score transformation function to, for example, function, use dot notation. • For a built-in function, enter this code and replace function with a value in the table. Mdl.ScoreTransform = 'function';

32-460

Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

ClassificationPartitionedLinearECOC

Value

Description

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function, or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class. Data Types: char | function_handle

Methods kfoldEdge

Classification edge for observations not used for training

kfoldLoss

Classification loss for observations not used in training

kfoldMargin

Classification margins for observations not used in training

kfoldPredict

Predict labels for observations not used for training

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples Create Cross-Validated Multiclass Linear Classification Model Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. Cross-validate a multiclass, linear classification model that can identify which MATLAB® toolbox a documentation web page is from based on counts of words on the page. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners','linear','CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedLinearECOC CrossValidatedModel: 'LinearECOC' ResponseName: 'Y'

32-461

32

Functions

NumObservations: KFold: Partition: ClassNames: ScoreTransform:

31572 10 [1x1 cvpartition] [1x13 categorical] 'none'

Properties, Methods

CVMdl is a ClassificationPartitionedLinearECOC cross-validated model. Because fitcecoc implements 10-fold cross-validation by default, CVMdl.Trained contains a 10-by-1 cell vector of ten CompactClassificationECOC models that contain the results of training ECOC models composed of binary, linear classification models for each of the folds. Estimate labels for out-of-fold observations and estimate the generalization error by passing CVMdl to kfoldPredict and kfoldLoss, respectively. oofLabels = kfoldPredict(CVMdl); ge = kfoldLoss(CVMdl) ge = 0.0958

The estimated generalization error is about 10% misclassified observations. To improve generalization error, try specifying another solver, such as LBFGS. To change default options when training ECOC models composed of linear classification models, create a linear classification model template using templateLinear, and then pass the template to fitcecoc.

Find Good Lasso Penalty Using Cross-Validation To determine a good lasso-penalty strength for an ECOC model composed of linear classification models that use logistic regression learners, implement 5-fold cross-validation. Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. For simplicity, use the label 'others' for all observations in Y that are not 'simulink', 'dsp', or 'comm'. Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others'; −7

Create a set of 11 logarithmically-spaced regularization strengths from 10

−2

through 10

.

Lambda = logspace(-7,-2,11);

Create a linear classification model template that specifies to use logistic regression learners, use lasso penalties with strengths in Lambda, train using SpaRSA, and lower the tolerance on the gradient of the objective function to 1e-8. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8);

32-462

ClassificationPartitionedLinearECOC

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. X = X'; rng(10); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns','KFold',5);

CVMdl is a ClassificationPartitionedLinearECOC model. Dissect CVMdl, and each model within it. numECOCModels = numel(CVMdl.Trained) numECOCModels = 5 ECOCMdl1 = CVMdl.Trained{1} ECOCMdl1 = classreg.learning.classif.CompactClassificationECOC ResponseName: 'Y' ClassNames: [comm dsp simulink others] ScoreTransform: 'none' BinaryLearners: {6×1 cell} CodingMatrix: [4×6 double] Properties, Methods numCLModels = numel(ECOCMdl1.BinaryLearners) numCLModels = 6 CLMdl1 = ECOCMdl1.BinaryLearners{1}

CLMdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [-1 1] ScoreTransform: 'logit' Beta: [34023×11 double] Bias: [-0.3125 -0.3596 -0.3596 -0.3596 -0.3596 -0.3596 -0.1593 -0.0737 -0.1759 -0.3 Lambda: [1.0000e-07 3.1623e-07 1.0000e-06 3.1623e-06 1.0000e-05 3.1623e-05 1.0000e-04 Learner: 'logistic' Properties, Methods

Because fitcecoc implements 5-fold cross-validation, CVMdl contains a 5-by-1 cell array of CompactClassificationECOC models that the software trains on each fold. The BinaryLearners property of each CompactClassificationECOC model contains the ClassificationLinear models. The number of ClassificationLinear models within each compact ECOC model depends on the number of distinct labels and coding design. Because Lambda is a sequence of regularization strengths, you can think of CLMdl1 as 11 models, one for each regularization strength in Lambda. Determine how well the models generalize by plotting the averages of the 5-fold classification error for each regularization strength. Identify the regularization strength that minimizes the generalization error over the grid. 32-463

32

Functions

ce = kfoldLoss(CVMdl); figure; plot(log10(Lambda),log10(ce)) [~,minCEIdx] = min(ce); minLambda = Lambda(minCEIdx); hold on plot(log10(minLambda),log10(ce(minCEIdx)),'ro'); ylabel('log_{10} 5-fold classification error') xlabel('log_{10} Lambda') legend('MSE','Min classification error') hold off

Train an ECOC model composed of linear classification model using the entire data set, and specify the minimal regularization strength. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',minLambda,'GradientTolerance',1e-8); MdlFinal = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns');

To estimate labels for new observations, pass MdlFinal and the new data to predict.

See Also ClassificationECOC | ClassificationLinear | fitcecoc | fitclinear | kfoldLoss | kfoldPredict

32-464

ClassificationPartitionedLinearECOC

Introduced in R2016a

32-465

32

Functions

ClassificationPartitionedModel Package: classreg.learning.partition Cross-validated classification model

Description ClassificationPartitionedModel is a set of classification models trained on cross-validated folds. Estimate the quality of classification by cross validation using one or more “kfold” methods: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun. Every “kfold” method uses models trained on in-fold observations to predict the response for out-offold observations. For example, suppose you cross validate using five folds. In this case, the software randomly assigns each observation into five roughly equally sized groups. The training fold contains four of the groups (i.e., roughly 4/5 of the data) and the test fold contains the other group (i.e., roughly 1/5 of the data). In this case, cross validation proceeds as follows: • The software trains the first model (stored in CVMdl.Trained{1}) using the observations in the last four groups and reserves the observations in the first group for validation. • The software trains the second model (stored in CVMdl.Trained{2}) using the observations in the first group and last three groups, and reserves the observations in the second group for validation. • The software proceeds in a similar fashion for the third to fifth models. If you validate by calling kfoldPredict, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation.

Construction CVMdl = crossval(Mdl) creates a cross-validated classification model from a classification model (Mdl). Alternatively: • CVDiscrMdl = fitcdiscr(X,Y,Name,Value) • CVKNNMdl = fitcknn(X,Y,Name,Value) • CVNBMdl = fitcnb(X,Y,Name,Value) • CVSVMMdl = fitcsvm(X,Y,Name,Value) • CVTreeMdl = fitctree(X,Y,Name,Value) create a cross-validated model when name is either 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. For syntax details, see fitcdiscr, fitcknn, fitcnb, fitcsvm, and fitctree.

32-466

ClassificationPartitionedModel

Input Arguments Mdl A classification model. Mdl can be any of the following: • A classification tree trained using fitctree • A discriminant analysis classifier trained using fitcdiscr • A naive Bayes classifier trained using fitcnb • A nearest-neighbor classifier trained using fitcknn • A support vector machine classifier trained using fitcsvm

Properties BinEdges Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the 'NumBins' name-value pair argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default). You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl. X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

32-467

32

Functions

CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). If Mdl is a trained discriminant analysis classifier, then CategoricalPredictors is always empty ([]). ClassNames Unique class labels used in training the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. If CVModel is a cross-validated ClassificationDiscriminant, ClassificationKNN, or ClassificationNaiveBayes model, then you can change its cost matrix to e.g., CostMatrix, using dot notation. CVModel.Cost = CostMatrix;

CrossValidatedModel Name of the cross-validated model, which is a character vector. KFold Number of folds used in cross-validated model, which is a positive integer. ModelParameters Object holding parameters of CVModel. Partition The partition of class CVPartition used in creating the cross-validated model. PredictorNames Predictor variable names, specified as a cell array of character vectors. The order of the elements of PredictorNames corresponds to the order in which the predictor names appear in the training data. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. If CVModel is a cross-validated ClassificationDiscriminant or ClassificationNaiveBayes model, then you can change its vector of priors to e.g., priorVector, using dot notation. CVModel.Prior = priorVector;

32-468

ClassificationPartitionedModel

ResponseName Response variable name, specified as a character vector. ScoreTransform Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Trained The trained learners, which is a cell array of compact classification models. W The scaled weights, which is a vector with length n, the number of rows in X. X A matrix or table of predictor values. Each column of X represents one variable, and each row represents one observation. 32-469

32

Functions

Y Categorical or character array, logical or numeric vector, or cell array of character vectors specifying the class labels for each observation. Y has the same number of rows as X, and each entry of Y is the response to the data in the corresponding row of X.

Methods kfoldEdge

Classification edge for observations not used for training

kfoldfun

Cross validate function

kfoldLoss

Classification loss for observations not used for training

kfoldMargin

Classification margins for observations not used for training

kfoldPredict

Predict response for observations not used for training

Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).

Examples Evaluate the Classification Error of a Classification Tree Classifier Evaluate the k-fold cross-validation error for a classification tree model. Load Fisher's iris data set. load fisheriris

Train a classification tree using default options. Mdl = fitctree(meas,species);

Cross validate the classification tree model. CVMdl = crossval(Mdl);

Estimate the 10-fold cross-validation loss. L = kfoldLoss(CVMdl) L = 0.0533

Estimate Posterior Probabilities for Test Samples Estimate positive class posterior probabilities for the test set of an SVM algorithm. Load the ionosphere data set. load ionosphere

32-470

ClassificationPartitionedModel

Train an SVM classifier. Specify a 20% holdout sample. It is good practice to standardize the predictors and specify the class order. rng(1) % For reproducibility CVSVMModel = fitcsvm(X,Y,'Holdout',0.2,'Standardize',true,... 'ClassNames',{'b','g'});

CVSVMModel is a trained ClassificationPartitionedModel cross-validated classifier. Estimate the optimal score function for mapping observation scores to posterior probabilities of an observation being classified as 'g'. ScoreCVSVMModel = fitSVMPosterior(CVSVMModel);

ScoreSVMModel is a trained ClassificationPartitionedModel cross-validated classifier containing the optimal score transformation function estimated from the training data. Estimate the out-of-sample positive class posterior probabilities. Display the results for the first 10 out-of-sample observations. [~,OOSPostProbs] = kfoldPredict(ScoreCVSVMModel); indx = ~isnan(OOSPostProbs(:,2)); hoObs = find(indx); % Holdout observation numbers OOSPostProbs = [hoObs, OOSPostProbs(indx,2)]; table(OOSPostProbs(1:10,1),OOSPostProbs(1:10,2),... 'VariableNames',{'ObservationIndex','PosteriorProbability'}) ans=10×2 table ObservationIndex ________________ 6 7 8 9 16 22 23 24 38 41

PosteriorProbability ____________________ 0.17378 0.89637 0.0076609 0.91602 0.026718 4.6081e-06 0.9024 2.4129e-06 0.00042697 0.86427

Tips To estimate posterior probabilities of trained, cross-validated SVM classifiers, use fitSVMPosterior.

See Also ClassificationKNN | ClassificationNaiveBayes | CompactClassificationDiscriminant | CompactClassificationSVM | CompactClassificationTree | fitSVMPosterior | fitcdiscr | fitcknn | fitcnb | fitcsvm | fitctree

32-471

32

Functions

Topics “Cross Validating a Discriminant Analysis Classifier” on page 20-17

32-472

ClassificationSVM

ClassificationSVM Support vector machine (SVM) for one-class and binary classification

Description ClassificationSVM is a support vector machine (SVM) classifier on page 32-488 for one-class and two-class learning. Trained ClassificationSVM classifiers store training data, parameter values, prior probabilities, support vectors, and algorithmic implementation information. Use these classifiers to perform tasks such as fitting a score-to-posterior-probability transformation function (see fitPosterior) and predicting labels for new data (see predict).

Creation Create a ClassificationSVM object by using fitcsvm.

Properties SVM Properties Alpha — Trained classifier coefficients numeric vector This property is read-only. Trained classifier coefficients, specified as an s-by-1 numeric vector. s is the number of support vectors in the trained classifier, sum(Mdl.IsSupportVector). Alpha contains the trained classifier coefficients from the dual problem, that is, the estimated Lagrange multipliers. If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, Alpha contains one coefficient corresponding to the entire set. That is, MATLAB attributes a nonzero coefficient to one observation from the set of duplicates and a coefficient of 0 to all other duplicate observations in the set. Data Types: single | double Beta — Linear predictor coefficients numeric vector This property is read-only. Linear predictor coefficients, specified as a numeric vector. The length of Beta is equal to the number of predictors used to train the model. MATLAB expands categorical variables in the predictor data using full dummy encoding. That is, MATLAB creates one dummy variable for each level of each categorical variable. Beta stores one value for each predictor variable, including the dummy variables. For example, if there are three predictors, one of which is a categorical variable with three levels, then Beta is a numeric vector containing five values. 32-473

32

Functions

If KernelParameters.Function is 'linear', then the classification score for the observation x is f x = x/s ′β + b . Mdl stores β, b, and s in the properties Beta, Bias, and KernelParameters.Scale, respectively. To estimate classification scores manually, you must first apply any transformations to the predictor data that were applied during training. Specifically, if you specify 'Standardize',true when using fitcsvm, then you must standardize the predictor data manually by using the mean Mdl.Mu and standard deviation Mdl.Sigma, and then divide the result by the kernel scale in Mdl.KernelParameters.Scale. All SVM functions, such as resubPredict and predict, apply any required transformation before estimation. If KernelParameters.Function is not 'linear', then Beta is empty ([]). Data Types: single | double Bias — Bias term scalar This property is read-only. Bias term, specified as a scalar. Data Types: single | double BoxConstraints — Box constraints numeric vector This property is read-only. Box constraints, specified as a numeric vector of n-by-1 box constraints on page 32-486. n is the number of observations in the training data (see the NumObservations property). If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations, MATLAB sums the box constraints and then attributes the sum to one observation. MATLAB attributes the box constraints of 0 to all other observations in the set. Data Types: single | double CacheInfo — Caching information structure array This property is read-only. Caching information, specified as a structure array. The caching information contains the fields described in this table.

32-474

Field

Description

Size

The cache size (in MB) that the software reserves to train the SVM classifier. For details, see 'CacheSize'.

ClassificationSVM

Field

Description

Algorithm

The caching algorithm that the software uses during optimization. Currently, the only available caching algorithm is Queue. You cannot set the caching algorithm.

Display the fields of CacheInfo by using dot notation. For example, Mdl.CacheInfo.Size displays the value of the cache size. Data Types: struct IsSupportVector — Support vector indicator logical vector This property is read-only. Support vector indicator, specified as an n-by-1 logical vector that flags whether a corresponding observation in the predictor data matrix is a “Support Vector” on page 32-488. n is the number of observations in the training data (see NumObservations). If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, IsSupportVector flags only one observation as a support vector. Data Types: logical KernelParameters — Kernel parameters structure array This property is read-only. Kernel parameters, specified as a structure array. The kernel parameters property contains the fields listed in this table. Field

Description

Function

Kernel function used to compute the elements of the Gram matrix on page 32-1689. For details, see 'KernelFunction'.

Scale

Kernel scale parameter used to scale all elements of the predictor data on which the model is trained. For details, see 'KernelScale'.

To display the values of KernelParameters, use dot notation. For example, Mdl.KernelParameters.Scale displays the kernel scale parameter value. The software accepts KernelParameters as inputs and does not modify them. Data Types: struct Nu — One-class learning parameter positive scalar This property is read-only. One-class learning on page 32-487 parameter ν, specified as a positive scalar. Data Types: single | double 32-475

32

Functions

OutlierFraction — Proportion of outliers numeric scalar This property is read-only. Proportion of outliers in the training data, specified as a numeric scalar. Data Types: double Solver — Optimization routine 'ISDA' | 'L1QP' | 'SMO' This property is read-only. Optimization routine used to train the SVM classifier, specified as 'ISDA', 'L1QP', or 'SMO'. For more details, see 'Solver'. SupportVectorLabels — Support vector class labels s-by-1 numeric vector This property is read-only. Support vector class labels, specified as an s-by-1 numeric vector. s is the number of support vectors in the trained classifier, sum(Mdl.IsSupportVector). A value of +1 in SupportVectorLabels indicates that the corresponding support vector is in the positive class (ClassNames{2}). A value of –1 indicates that the corresponding support vector is in the negative class (ClassNames{1}). If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, SupportVectorLabels contains one unique support vector label. Data Types: single | double SupportVectors — Support vectors s-by-p numeric matrix This property is read-only. Support vectors in the trained classifier, specified as an s-by-p numeric matrix. s is the number of support vectors in the trained classifier, sum(Mdl.IsSupportVector), and p is the number of predictor variables in the predictor data. SupportVectors contains rows of the predictor data X that MATLAB considers to be support vectors. If you specify 'Standardize',true when training the SVM classifier using fitcsvm, then SupportVectors contains the standardized rows of X. If you remove duplicates by using the RemoveDuplicates name-value pair argument of fitcsvm, then for a given set of duplicate observations that are support vectors, SupportVectors contains one unique support vector. Data Types: single | double

32-476

ClassificationSVM

Other Classification Properties CategoricalPredictors — Categorical predictor indices vector of positive integers | [] This property is read-only. Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). Data Types: single | double ClassNames — Unique class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Unique class labels used in training the model, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Data Types: single | double | logical | char | cell | categorical Cost — Misclassification cost numeric square matrix This property is read-only. Misclassification cost, specified as a numeric square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. During training, the software updates the prior probabilities by incorporating the penalties described in the cost matrix. • For two-class learning, Cost always has this form: Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j. The rows correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. • For one-class learning, Cost = 0. For more details, see Algorithms on page 32-489. Data Types: double ExpandedPredictorNames — Expanded predictor names cell array of character vectors This property is read-only. Expanded predictor names, specified as a cell array of character vectors. If the model uses dummy variable encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. Data Types: cell 32-477

32

Functions

Gradient — Training data gradient values numeric vector This property is read-only. Training data gradient values, specified as a numeric vector. The length of Gradient is equal to the number of observations (see NumObservations). Data Types: single | double ModelParameters — Parameters used to train model structure array This property is read-only. Parameters used to train the ClassificationSVM model, specified as a structure array. ModelParameters contains parameter values such as the name-value pair argument values used to train the SVM classifier. ModelParameters does not contain estimated parameters. Access the fields of ModelParameters by using dot notation. For example, access the initial values for estimating Alpha by using Mdl.ModelParameters.Alpha. Data Types: struct Mu — Predictor means numeric vector | [] This property is read-only. Predictor means, specified as a numeric vector. If you specify 'Standardize',1 or 'Standardize',true when you train an SVM classifier using fitcsvm, then the length of Mu is equal to the number of predictors. MATLAB expands categorical variables in the predictor data using full dummy encoding. That is, MATLAB creates one dummy variable for each level of each categorical variable. Mu stores one value for each predictor variable, including the dummy variables. However, MATLAB does not standardize the columns that contain categorical variables. If you set 'Standardize',false when you train the SVM classifier using fitcsvm, then Mu is an empty vector ([]). Data Types: single | double NumObservations — Number of observations numeric scalar This property is read-only. Number of observations in the training data stored in X and Y, specified as a numeric scalar. Data Types: single | double PredictorNames — Predictor variable names cell array of character vectors This property is read-only. 32-478

ClassificationSVM

Predictor variable names, specified as a cell array of character vectors. The order of the elements of PredictorNames corresponds to the order in which the predictor names appear in the training data. Data Types: cell Prior — Prior probabilities numeric vector This property is read-only. Prior probabilities for each class, specified as a numeric vector. The order of the elements of Prior corresponds to the elements of Mdl.ClassNames. For two-class learning, if you specify a cost matrix, then the software updates the prior probabilities by incorporating the penalties described in the cost matrix. For more details, see Algorithms on page 32-489. Data Types: single | double ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char RowsUsed — Rows used in fitting [] | logical vector This property is read-only. Rows of the original data X used in fitting the ClassificationSVM model, specified as a logical vector. This property is empty if all rows are used. Data Types: logical ScoreTransform — Score transformation character vector | function handle Score transformation, specified as a character vector or function handle. ScoreTransform represents a built-in transformation function or a function handle for transforming predicted classification scores. To change the score transformation function to function, for example, use dot notation. • For a built-in function, enter a character vector. Mdl.ScoreTransform = 'function';

This table describes the available built-in functions. Value

Description

'doublelogit'

1/(1 + e–2x)

32-479

32

Functions

Value

Description

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Data Types: char | function_handle Sigma — Predictor standard deviations [] (default) | numeric vector This property is read-only. Predictor standard deviations, specified as a numeric vector. If you specify 'Standardize',true when you train the SVM classifier using fitcsvm, then the length of Sigma is equal to the number of predictor variables. MATLAB expands categorical variables in the predictor data using full dummy encoding. That is, MATLAB creates one dummy variable for each level of each categorical variable. Sigma stores one value for each predictor variable, including the dummy variables. However, MATLAB does not standardize the columns that contain categorical variables. If you set 'Standardize',false when you train the SVM classifier using fitcsvm, then Sigma is an empty vector ([]). Data Types: single | double W — Observation weights ones(n,1) (default) | numeric vector This property is read-only. Observation weights used to train the SVM classifier, specified as an n-by-1 numeric vector. n is the number of observations (see NumObservations). fitcsvm normalizes the observation weights specified in the 'Weights' name-value pair argument so that the elements of W within a particular class sum up to the prior probability of that class. 32-480

ClassificationSVM

Data Types: single | double X — Unstandardized predictors numeric matrix | table This property is read-only. Unstandardized predictors used to train the SVM classifier, specified as a numeric matrix or table. Each row of X corresponds to one observation, and each column corresponds to one variable. MATLAB excludes observations containing at least one missing value, and removes corresponding elements from Y. Data Types: single | double Y — Class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors This property is read-only. Class labels used to train the SVM classifier, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Y is the same data type as the input argument Y of fitcsvm. (The software treats string arrays as cell arrays of character vectors.) Each row of Y represents the observed classification of the corresponding row of X. MATLAB excludes elements containing missing values, and removes corresponding observations from X. Data Types: single | double | logical | char | cell | categorical Convergence Control Properties ConvergenceInfo — Convergence information structure array This property is read-only. Convergence information, specified as a structure array. Field

Description

Converged

Logical flag indicating whether the algorithm converged (1 indicates convergence).

ReasonForConvergence

Character vector indicating the criterion the software uses to detect convergence.

Gap

Scalar feasibility gap between the dual and primal objective functions.

GapTolerance

Scalar feasibility gap tolerance. Set this tolerance, for example to 1e-2, by using the name-value pair argument 'GapTolerance',1e-2 of fitcsvm.

32-481

32

Functions

Field

Description

DeltaGradient

Scalar-attained gradient difference between upper and lower violators

DeltaGradientTolerance

Scalar tolerance for the gradient difference between upper and lower violators. Set this tolerance, for example to 1e-2, by using the name-value pair argument 'DeltaGradientTolerance',1e-2 of fitcsvm.

LargestKKTViolation

Maximal scalar Karush-Kuhn-Tucker (KKT) violation value.

KKTTolerance

Scalar tolerance for the largest KKT violation. Set this tolerance, for example, to 1e-3, by using the name-value pair argument 'KKTTolerance',1e-3 of fitcsvm.

History

Structure array containing convergence information at set optimization iterations. The fields are: • NumIterations: numeric vector of iteration indices for which the software records convergence information • Gap: numeric vector of Gap values at the iterations • DeltaGradient: numeric vector of DeltaGradient values at the iterations • LargestKKTViolation: numeric vector of LargestKKTViolation values at the iterations • NumSupportVectors: numeric vector indicating the number of support vectors at the iterations • Objective: numeric vector of Objective values at the iterations

Objective

Scalar value of the dual objective function.

Data Types: struct NumIterations — Number of iterations positive integer This property is read-only. Number of iterations required by the optimization routine to attain convergence, specified as a positive integer. To set the limit on the number of iterations to 1000, for example, specify 'IterationLimit',1000 when you train the SVM classifier using fitcsvm. Data Types: double 32-482

ClassificationSVM

ShrinkagePeriod — Number of iterations between reductions of active set nonnegative integer This property is read-only. Number of iterations between reductions of the active set, specified as a nonnegative integer. To set the shrinkage period to 1000, for example, specify 'ShrinkagePeriod',1000 when you train the SVM classifier using fitcsvm. Data Types: single | double Hyperparameter Optimization Properties HyperparameterOptimizationResults — Description of cross-validation optimization of hyperparameters BayesianOptimization object | table This property is read-only. Description of the cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty when the 'OptimizeHyperparameters' name-value pair argument of fitcsvm is nonempty at creation. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer field in the HyperparameterOptimizationOptions structure of fitcsvm at creation, as described in this table. Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Object Functions compact compareHoldout crossval discardSupportVectors edge fitPosterior loss margin predict resubEdge resubLoss resubMargin resubPredict

Reduce size of support vector machine (SVM) classifier Compare accuracies of two classification models using new data Cross-validate support vector machine (SVM) classifier Discard support vectors for linear support vector machine (SVM) classifier Find classification edge for support vector machine (SVM) classifier Fit posterior probabilities for support vector machine (SVM) classifier Find classification error for support vector machine (SVM) classifier Find classification margins for support vector machine (SVM) classifier Classify observations using support vector machine (SVM) classifier Find classification edge for support vector machine (SVM) classifier by resubstitution Find classification loss for support vector machine (SVM) classifier by resubstitution Find classification margins for support vector machine (SVM) classifier by resubstitution Classify observations in support vector machine (SVM) classifier 32-483

32

Functions

resume

Resume training support vector machine (SVM) classifier

Examples Train SVM Classifier Load Fisher's iris data set. Remove the sepal lengths and widths and all observed setosa irises. load fisheriris inds = ~strcmp(species,'setosa'); X = meas(inds,3:4); y = species(inds);

Train an SVM classifier using the processed data set. SVMModel = fitcsvm(X,y) SVMModel = ClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: Alpha: Bias: KernelParameters: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

'Y' [] {'versicolor' 'virginica'} 'none' 100 [24x1 double] -14.4149 [1x1 struct] [100x1 double] [1x1 struct] [100x1 logical] 'SMO'

Properties, Methods

SVMModel is a trained ClassificationSVM classifier. Display the properties of SVMModel. For example, to determine the class order, use dot notation. classOrder = SVMModel.ClassNames classOrder = 2x1 cell {'versicolor'} {'virginica' }

The first class ('versicolor') is the negative class, and the second ('virginica') is the positive class. You can change the class order during training by using the 'ClassNames' name-value pair argument. Plot a scatter diagram of the data and circle the support vectors. sv = SVMModel.SupportVectors; figure gscatter(X(:,1),X(:,2),y) hold on

32-484

ClassificationSVM

plot(sv(:,1),sv(:,2),'ko','MarkerSize',10) legend('versicolor','virginica','Support Vector') hold off

The support vectors are observations that occur on or beyond their estimated class boundaries. You can adjust the boundaries (and, therefore, the number of support vectors) by setting a box constraint during training using the 'BoxConstraint' name-value pair argument.

Train and Cross-Validate SVM Classifier Load the ionosphere data set. load ionosphere

Train and cross-validate an SVM classifier. Standardize the predictor data and specify the order of the classes. rng(1); % For reproducibility CVSVMModel = fitcsvm(X,Y,'Standardize',true,... 'ClassNames',{'b','g'},'CrossVal','on') CVSVMModel = classreg.learning.partition.ClassificationPartitionedModel CrossValidatedModel: 'SVM'

32-485

32

Functions

PredictorNames: ResponseName: NumObservations: KFold: Partition: ClassNames: ScoreTransform:

{1x34 cell} 'Y' 351 10 [1x1 cvpartition] {'b' 'g'} 'none'

Properties, Methods

CVSVMModel is a ClassificationPartitionedModel cross-validated SVM classifier. By default, the software implements 10-fold cross-validation. Alternatively, you can cross-validate a trained ClassificationSVM classifier by passing it to crossval. Inspect one of the trained folds using dot notation. CVSVMModel.Trained{1} ans = classreg.learning.classif.CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78x1 double] Bias: -0.2209 KernelParameters: [1x1 struct] Mu: [1x34 double] Sigma: [1x34 double] SupportVectors: [78x34 double] SupportVectorLabels: [78x1 double] Properties, Methods

Each fold is a CompactClassificationSVM classifier trained on 90% of the data. Estimate the generalization error. genError = kfoldLoss(CVSVMModel) genError = 0.1168

On average, the generalization error is approximately 12%.

More About Box Constraint A box constraint is a parameter that controls the maximum penalty imposed on margin-violating observations, which helps to prevent overfitting (regularization). 32-486

ClassificationSVM

If you increase the box constraint, then the SVM classifier assigns fewer support vectors. However, increasing the box constraint can lead to longer training times. Gram Matrix The Gram matrix of a set of n vectors {x1,..,xn; xj ∊ Rp} is an n-by-n matrix with element (j,k) defined as G(xj,xk) = , an inner product of the transformed predictors using the kernel function ϕ. For nonlinear SVM, the algorithm forms a Gram matrix using the rows of the predictor data X. The dual formalization replaces the inner product of the observations in X with corresponding elements of the resulting Gram matrix (called the “kernel trick”). Consequently, nonlinear SVM operates in the transformed predictor space to find a separating hyperplane. Karush-Kuhn-Tucker Complementarity Conditions KKT complementarity conditions are optimization constraints required for optimal nonlinear programming solutions. In SVM, the KKT complementarity conditions are α j y jf x j − 1 + ξ j = 0 ξ j C − αj = 0 for all j = 1,...,n, where f x j = ϕ x j ′β + b, ϕ is a kernel function (see Gram matrix on page 32-1689), and ξj is a slack variable. If the classes are perfectly separable, then ξj = 0 for all j = 1,...,n. One-Class Learning One-class learning, or unsupervised SVM, aims to separate data from the origin in the highdimensional predictor space (not the original predictor space), and is an algorithm used for outlier detection. The algorithm resembles that of SVM for binary classification on page 32-1690. The objective is to minimize the dual expression 0.5 ∑ α jαkG(x j, xk) jk

with respect to α1, ..., αn, subject to

∑ α j = nν and 0 ≤ α j ≤ 1 for all j = 1,...,n. The value of G(xj,xk) is in element (j,k) of the Gram matrix on page 321689. A small value of ν leads to fewer support vectors and, therefore, a smooth, crude decision boundary. A large value of ν leads to more support vectors and, therefore, a curvy, flexible decision boundary. The optimal value of ν should be large enough to capture the data complexity and small enough to avoid overtraining. Also, 0 < ν ≤ 1. For more details, see [5]. 32-487

32

Functions

Support Vector Support vectors are observations corresponding to strictly positive estimates of α1,...,αn. SVM classifiers that yield fewer support vectors for a given training set are preferred. Support Vector Machines for Binary Classification The SVM binary classification algorithm searches for an optimal hyperplane that separates the data into two classes. For separable classes, the optimal hyperplane maximizes a margin (space that does not contain any observations) surrounding itself, which creates boundaries for the positive and negative classes. For inseparable classes, the objective is the same, but the algorithm imposes a penalty on the length of the margin for every observation that is on the wrong side of its class boundary. The linear SVM score function is f (x) = x′β + b, where: • x is an observation (corresponding to a row of X). • The vector β contains the coefficients that define an orthogonal vector to the hyperplane (corresponding to Mdl.Beta). For separable data, the optimal margin length is 2/ β . • b is the bias term (corresponding to Mdl.Bias). The root of f(x) for particular coefficients defines a hyperplane. For a particular hyperplane, f(z) is the distance from point z to the hyperplane. The algorithm searches for the maximum margin length, while keeping observations in the positive (y = 1) and negative (y = –1) classes separate. • For separable classes, the objective is to minimize β with respect to the β and b subject to yjf(xj) ≥ 1, for all j = 1,..,n. This is the primal formalization for separable classes. • For inseparable classes, the algorithm uses slack variables (ξj) to penalize the objective function for observations that cross the margin boundary for their class. ξj = 0 for observations that do not cross the margin boundary for their class, otherwise ξj ≥ 0. 2

The objective is to minimize 0.5 β + C ∑ ξ j with respect to the β, b, and ξj subject to y j f (x j) ≥ 1 − ξ j and ξ j ≥ 0 for all j = 1,..,n, and for a positive scalar box constraint on page 32-1689 C. This is the primal formalization for inseparable classes. The algorithm uses the Lagrange multipliers method to optimize the objective, which introduces n coefficients α1,...,αn (corresponding to Mdl.Alpha). The dual formalizations for linear SVM are as follows: • For separable classes, minimize n

0.5

n



j=1

32-488



k=1

n

α jαk y j ykx j′xk −



j=1

αj

ClassificationSVM

with respect to α1,...,αn, subject to ∑ α j y j = 0, αj ≥ 0 for all j = 1,...,n, and Karush-Kuhn-Tucker (KKT) complementarity conditions on page 32-1690. • For inseparable classes, the objective is the same as for separable classes, except for the additional condition 0 ≤ α j ≤ C for all j = 1,..,n. The resulting score function is f (x) =

n



j=1

α j y jx′x j + b .

b is the estimate of the bias and α j is the jth estimate of the vector α , j = 1,...,n. Written this way, the score function is free of the estimate of β as a result of the primal formalization. The SVM algorithm classifies a new observation z using sign f z . In some cases, a nonlinear boundary separates the classes. Nonlinear SVM works in a transformed predictor space to find an optimal, separating hyperplane. The dual formalization for nonlinear SVM is 0.5

n

n

∑ ∑

j = 1k = 1

α jαk y j ykG(x j, xk) −

n



j=1

αj

with respect to α1,...,αn, subject to ∑ α j y j = 0, 0 ≤ α j ≤ C for all j = 1,..,n, and the KKT complementarity conditions. G(xk,xj) are elements of the Gram matrix on page 32-1689. The resulting score function is f (x) =

n



j=1

α j y jG(x, x j) + b .

For more details, see Understanding Support Vector Machines on page 18-145, [1], and [3].

Algorithms • For the mathematical formulation of the SVM binary classification algorithm, see “Support Vector Machines for Binary Classification” on page 32-1690 and “Understanding Support Vector Machines” on page 18-145. • NaN, , empty character vector (''), empty string (""), and values indicate missing values. fitcsvm removes entire rows of data corresponding to a missing response. When computing total weights (see the next bullets), fitcsvm ignores any weight corresponding to an observation with at least one missing predictor. This action can lead to unbalanced prior probabilities in balanced-class problems. Consequently, observation box constraints might not equal BoxConstraint. • fitcsvm removes observations that have zero weight or prior probability. • For two-class learning, if you specify the cost matrix C (see Cost), then the software updates the class prior probabilities p (see Prior) to pc by incorporating the penalties described in C. Specifically, fitcsvm completes these steps: 32-489

32

Functions

1

Compute pc∗ = p′C .

2

Normalize pc* so that the updated prior probabilities sum to 1. 1

pc =

K



j=1

∗ pc, j

pc∗ .

K is the number of classes. Reset the cost matrix to the default

3

01 . 10

C=

Remove observations from the training data corresponding to classes with zero prior probability.

4

• For two-class learning, fitcsvm normalizes all observation weights (see Weights) to sum to 1. The function then renormalizes the normalized weights to sum up to the updated prior probability of the class to which the observation belongs. That is, the total weight for observation j in class k is wj

w∗j =



∀ j ∈ Class k

wj

pc, k .

wj is the normalized weight for observation j; pc,k is the updated prior probability of class k (see previous bullet). • For two-class learning, fitcsvm assigns a box constraint to each observation in the training data. The formula for the box constraint of observation j is C j = nC0w∗j . n is the training sample size, C0 is the initial box constraint (see the 'BoxConstraint' namevalue pair argument), and w∗j is the total weight of observation j (see previous bullet). • If you set 'Standardize',true and the 'Cost', 'Prior', or 'Weights' name-value pair argument, then fitcsvm standardizes the predictors using their corresponding weighted means and weighted standard deviations. That is, fitcsvm standardizes predictor j (xj) using x j − μ∗j

x∗j = μ∗j =

σ∗j

.

1 w∗x . ∗ ∑ k jk w ∑ kk k

xjk is observation k (row) of predictor j (column). σ∗j

2

v1 =

32-490

=

v1 2 v1 − v2 k

∑ w∗j . j

∑ wk∗ x jk − μ∗j 2 .

ClassificationSVM

v2 =

∑ j

2

w∗j .

• Assume that p is the proportion of outliers that you expect in the training data, and that you set 'OutlierFraction',p. • For one-class learning, the software trains the bias term such that 100p% of the observations in the training data have negative scores. • The software implements robust learning for two-class learning. In other words, the software attempts to remove 100p% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude. • If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable. • The PredictorNames property stores one element for each of the original predictor variable names. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then PredictorNames is a 1-by-3 cell array of character vectors containing the original names of the predictor variables. • The ExpandedPredictorNames property stores one element for each of the predictor variables, including the dummy variables. For example, assume that there are three predictors, one of which is a categorical variable with three levels. Then ExpandedPredictorNames is a 1-by-5 cell array of character vectors containing the names of the predictor variables and the new dummy variables. • Similarly, the Beta property stores one beta coefficient for each predictor, including the dummy variables. • The SupportVectors property stores the predictor values for the support vectors, including the dummy variables. For example, assume that there are m support vectors and three predictors, one of which is a categorical variable with three levels. Then SupportVectors is an n-by-5 matrix. • The X property stores the training data as originally input and does not include the dummy variables. When the input is a table, X contains only the columns used as predictors. • For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables. • For a variable with k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is –1 for levels up to j, and +1 for levels j + 1 through k. • The names of the dummy variables stored in the ExpandedPredictorNames property indicate the first level with the value +1. The software stores k – 1 additional predictor names for the dummy variables, including the names of levels 2, 3, ..., k. • All solvers implement L1 soft-margin minimization. • For one-class learning, the software estimates the Lagrange multipliers, α1,...,αn, such that n



j=1

α j = nν .

References [1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008. 32-491

32

Functions

[2] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. “Estimating the Support of a High-Dimensional Distribution.” Neural Comput., Vol. 13, Number 7, 2001, pp. 1443–1471. [3] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000. [4] Scholkopf, B., and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning. Cambridge, MA: The MIT Press, 2002.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict and update functions support code generation. • When you train an SVM model by using fitcsvm, the following restrictions apply. • The class labels input argument value (Y) cannot be a categorical array. • Code generation does not support categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the 'CategoricalPredictors' name-value pair argument. To include categorical predictors in a model, preprocess the categorical predictors by using dummyvar before fitting the model. • The value of the 'ClassNames' name-value pair argument cannot be a categorical array. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For generating code that predicts posterior probabilities given new observations, pass a trained SVM model to fitPosterior or fitSVMPosterior. The ScoreTransform property of the returned model contains an anonymous function that represents the score-toposterior-probability function and is configured for code generation. • For fixed-point code generation, the value of the 'ScoreTransform' name-value pair argument cannot be 'invlogit'. For more information, see “Introduction to Code Generation” on page 31-2.

See Also ClassificationPartitionedModel | CompactClassificationSVM | fitcsvm Topics Using Support Vector Machines on page 18-149 Understanding Support Vector Machines on page 18-145 Introduced in R2014a

32-492

ClassificationSVMCoderConfigurer

ClassificationSVMCoderConfigurer Coder configurer for support vector machine (SVM) for one-class and binary classification

Description A ClassificationSVMCoderConfigurer object is a coder configurer of an SVM classification model (ClassificationSVM or CompactClassificationSVM). A coder configurer offers convenient features to configure code generation options, generate C/C++ code, and update model parameters in the generated code. • Configure code generation options and specify the coder attributes of SVM model parameters by using object properties. • Generate C/C++ code for the predict and update functions of the SVM classification model by using generateCode. Generating C/C++ code requires MATLAB Coder. • Update model parameters in the generated C/C++ code without having to regenerate the code. This feature reduces the effort required to regenerate, redeploy, and reverify C/C++ code when you retrain the SVM model with new data or settings. Before updating model parameters, use validatedUpdateInputs to validate and extract the model parameters to update. This flow chart shows the code generation workflow using a coder configurer.

For the code generation usage notes and limitations of an SVM classification model, see the Code Generation sections of CompactClassificationSVM, predict, and update.

Creation After training an SVM classification model by using fitcsvm, create a coder configurer for the model by using learnerCoderConfigurer. Use the properties of a coder configurer to specify the coder attributes of predict and update arguments. Then, use generateCode to generate C/C++ code based on the specified coder attributes.

Properties predict Arguments The properties listed in this section specify the coder attributes of the predict function arguments in the generated code. 32-493

32

Functions

X — Coder attributes of predictor data LearnerCoderInput object Coder attributes of predictor data to pass to the generated C/C++ code for the predict function of the SVM classification model, specified as a LearnerCoderInput on page 32-506 object. When you create a coder configurer by using the learnerCoderConfigurer function, the input argument X determines the default values of the LearnerCoderInput coder attributes: • SizeVector — The default value is the array size of the input X. • VariableDimensions — This value is [0 0](default) or [1 0]. • [0 0] indicates that the array size is fixed as specified in SizeVector. • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns. • DataType — This value is single or double. The default data type depends on the data type of the input X. • Tunability — This value must be true, meaning that predict in the generated C/C++ code always includes predictor data as an input. You can modify the coder attributes by using dot notation. For example, to generate C/C++ code that accepts predictor data with 100 observations of three predictor variables, specify these coder attributes of X for the coder configurer configurer: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [0 0];

[0 0] indicates that the first and second dimensions of X (number of observations and number of predictor variables, respectively) have fixed sizes. To allow the generated C/C++ code to accept predictor data with up to 100 observations, specify these coder attributes of X: configurer.X.SizeVector = [100 3]; configurer.X.DataType = 'double'; configurer.X.VariableDimensions = [1 0];

[1 0] indicates that the first dimension of X (number of observations) has a variable size and the second dimension of X (number of predictor variables) has a fixed size. The specified number of observations, 100 in this example, becomes the maximum allowed number of observations in the generated C/C++ code. To allow any number of observations, specify the bound as Inf. NumOutputs — Number of outputs in predict 1 (default) | 2 Number of output arguments to return from the generated C/C++ code for the predict function of the SVM classification model, specified as 1 or 2. The output arguments of predict are label (predicted class labels) and score (scores or posterior probabilities) in the order of listed. predict in the generated C/C++ code returns the first n outputs of the predict function, where n is the NumOutputs value. 32-494

ClassificationSVMCoderConfigurer

After creating the coder configurer configurer, you can specify the number of outputs by using dot notation. configurer.NumOutputs = 2;

The NumOutputs property is equivalent to the '-nargout' compiler option of codegen. This option specifies the number of output arguments in the entry-point function of code generation. The object function generateCode generates two entry-point functions—predict.m and update.m for the predict and update functions of an SVM classification model, respectively—and generates C/C++ code for the two entry-point functions. The specified value for the NumOutputs property corresponds to the number of output arguments in the entry-point function predict.m. Data Types: double update Arguments The properties listed in this section specify the coder attributes of the update function arguments in the generated code. The update function takes a trained model and new model parameters as input arguments, and returns an updated version of the model that contains the new parameters. To enable updating the parameters in the generated code, you need to specify the coder attributes of the parameters before generating code. Use a LearnerCoderInput on page 32-506 object to specify the coder attributes of each parameter. The default attribute values are based on the model parameters in the input argument Mdl of learnerCoderConfigurer. Alpha — Coder attributes of trained classifier coefficients LearnerCoderInput object Coder attributes of the trained classifier coefficients (Alpha of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — The default value is [s,1], where s is the number of support vectors in Mdl. • VariableDimensions — This value is [0 0](default) or [1 0]. • [0 0] indicates that the array size is fixed as specified in SizeVector. • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train a model with a linear kernel function, and the model stores the linear predictor coefficients (Beta) without the support vectors and related values, then this value must be false. Otherwise, this value must be true. Beta — Coder attributes of linear predictor coefficients LearnerCoderInput object Coder attributes of the linear predictor coefficients (Beta of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: 32-495

32

Functions

• SizeVector — This value must be [p 1], where p is the number of predictors in Mdl. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train a model with a linear kernel function, and the model stores the linear predictor coefficients (Beta) without the support vectors and related values, then this value must be true. Otherwise, this value must be false. Bias — Coder attributes of bias term LearnerCoderInput object Coder attributes of the bias term (Bias of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — This value must be true. Cost — Coder attributes of misclassification cost LearnerCoderInput object Coder attributes of the misclassification cost (Cost of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — For binary classification, this value must be [2 2]. For one-class classification, this value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — For binary classification, the default value is true. For one-class classification, this value must be false. Mu — Coder attributes of predictor means LearnerCoderInput object Coder attributes of the predictor means (Mu of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: 32-496

ClassificationSVMCoderConfigurer

• SizeVector — If you train Mdl using standardized predictor data by specifying 'Standardize',true, this value must be [1,p], where p is the number of predictors in Mdl. Otherwise, this value must be [0,0]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train Mdl using standardized predictor data by specifying 'Standardize',true, the default value is true. Otherwise, this value must be false. Prior — Coder attributes of prior probabilities LearnerCoderInput object Coder attributes of the prior probabilities (Prior of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — For binary classification, this value must be [1 2]. For one-class classification, this value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — For binary classification, the default value is true. For one-class classification, this value must be false. Scale — Coder attributes of kernel scale parameter LearnerCoderInput object Coder attributes of the kernel scale parameter (KernelParameters.Scale of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — This value must be [1 1]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — The default value is true. Sigma — Coder attributes of predictor standard deviations LearnerCoderInput object Coder attributes of the predictor standard deviations (Sigma of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: 32-497

32

Functions

• SizeVector — If you train Mdl using standardized predictor data by specifying 'Standardize',true, this value must be [1,p], where p is the number of predictors in Mdl. Otherwise, this value must be [0,0]. • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train Mdl using standardized predictor data by specifying 'Standardize',true, the default value is true. Otherwise, this value must be false. SupportVectorLabels — Coder attributes of support vector class labels LearnerCoderInput object Coder attributes of the support vector class labels (SupportVectorLabels of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — The default value is [s,1], where s is the number of support vectors in Mdl. • VariableDimensions — This value is [0 0](default) or [1 0]. • [0 0] indicates that the array size is fixed as specified in SizeVector. • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. • Tunability — If you train a model with a linear kernel function, and the model stores the linear predictor coefficients (Beta) without the support vectors and related values, then this value must be false. Otherwise, this value must be true. SupportVectors — Coder attributes of support vectors LearnerCoderInput object Coder attributes of the support vectors (SupportVectors of an SVM classification model), specified as a LearnerCoderInput on page 32-506 object. The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer: • SizeVector — The default value is [s,p], where s is the number of support vectors, and p is the number of predictors in Mdl. • VariableDimensions — This value is [0 0](default) or [1 0]. • [0 0] indicates that the array size is fixed as specified in SizeVector. • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns. • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl. 32-498

ClassificationSVMCoderConfigurer

• Tunability — If you train a model with a linear kernel function, and the model stores the linear predictor coefficients (Beta) without the support vectors and related values, then this value must be false. Otherwise, this value must be true. Other Configurer Options OutputFileName — File name of generated C/C++ code 'ClassificationSVMModel' (default) | character vector File name of the generated C/C++ code, specified as a character vector. The object function generateCode of ClassificationSVMCoderConfigurer generates C/C++ code using this file name. The file name must not contain spaces because they can lead to code generation failures in certain operating system configurations. Also, the name must be a valid MATLAB function name. After creating the coder configurer configurer, you can specify the file name by using dot notation. configurer.OutputFileName = 'myModel';

Data Types: char Verbose — Verbosity level true (logical 1) (default) | false (logical 0) Verbosity level, specified as true (logical 1) or false (logical 0). The verbosity level controls the display of notification messages at the command line. Value

Description

true (logical 1)

The software displays notification messages when your changes to the coder attributes of a parameter result in changes for other dependent parameters.

false (logical 0)

The software does not display notification messages.

To enable updating machine learning model parameters in the generated code, you need to configure the coder attributes of the parameters before generating code. The coder attributes of parameters are dependent on each other, so the software stores the dependencies as configuration constraints. If you modify the coder attributes of a parameter by using a coder configurer, and the modification requires subsequent changes to other dependent parameters to satisfy configuration constraints, then the software changes the coder attributes of the dependent parameters. The verbosity level determines whether or not the software displays notification messages for these subsequent changes. After creating the coder configurer configurer, you can modify the verbosity level by using dot notation. configurer.Verbose = false;

Data Types: logical Options for Code Generation Customization To customize the code generation workflow, use the generateFiles function and the following three properties with codegen, instead of using the generateCode function. 32-499

32

Functions

After generating the two entry-point function files (predict.m and update.m) by using the generateFiles function, you can modify these files according to your code generation workflow. For example, you can modify the predict.m file to include data preprocessing, or you can add these entry-point functions to another code generation project. Then, you can generate C/C++ code by using the codegen function and the codegen arguments appropriate for the modified entry-point functions or code generation project. Use the three properties described in this section as a starting point to set the codegen arguments. CodeGenerationArguments — codegen arguments cell array This property is read-only. codegen arguments, specified as a cell array. This property enables you to customize the code generation workflow. Use the generateCode function if you do not need to customize your workflow. Instead of using generateCode with the coder configurer configurer, you can generate C/C++ code as follows: generateFiles(configurer) cgArgs = configurer.CodeGenerationArguments; codegen(cgArgs{:})

If you customize the code generation workflow, modify cgArgs accordingly before calling codegen. If you modify other properties of configurer, the software updates the CodeGenerationArguments property accordingly. Data Types: cell PredictInputs — Input argument of predict cell array of a coder.PrimitiveType object This property is read-only. Input argument of the entry-point function predict.m for code generation, specified as a cell array of a coder.PrimitiveType object. The coder.PrimitiveType object includes the coder attributes of the predictor data stored in the X property. If you modify the coder attributes of the predictor data, then the software updates the coder.PrimitiveType object accordingly. The coder.PrimitiveType object in PredictInputs is equivalent to configurer.CodeGenerationArguments{6} for the coder configurer configurer. Data Types: cell UpdateInputs — List of tunable input arguments of update cell array of a structure including coder.PrimitiveType objects This property is read-only. List of the tunable input arguments of the entry-point function update.m for code generation, specified as a cell array of a structure including coder.PrimitiveType objects. Each 32-500

ClassificationSVMCoderConfigurer

coder.PrimitiveType object includes the coder attributes of a tunable machine learning model parameter. If you modify the coder attributes of a model parameter by using the coder configurer properties (update Arguments on page 32-495 properties), then the software updates the corresponding coder.PrimitiveType object accordingly. If you specify the Tunability attribute of a machine learning model parameter as false, then the software removes the corresponding coder.PrimitiveType object from the UpdateInputs list. The structure in UpdateInputs is equivalent to configurer.CodeGenerationArguments{3} for the coder configurer configurer. Data Types: cell

Object Functions generateCode generateFiles validatedUpdateInputs

Generate C/C++ code using coder configurer Generate MATLAB files for code generation using coder configurer Validate and extract machine learning model parameters to update

Examples Generate Code Using Coder Configurer Train a machine learning model, and then generate code for the predict and update functions of the model by using a coder configurer. Load the ionosphere data set and train a binary SVM classification model. load ionosphere Mdl = fitcsvm(X,Y);

Mdl is a ClassificationSVM object. Create a coder configurer for the ClassificationSVM model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input. configurer = learnerCoderConfigurer(Mdl,X) configurer = ClassificationSVMCoderConfigurer with properties: Update Inputs: Alpha: SupportVectors: SupportVectorLabels: Scale: Bias: Prior: Cost:

[1x1 [1x1 [1x1 [1x1 [1x1 [1x1 [1x1

LearnerCoderInput] LearnerCoderInput] LearnerCoderInput] LearnerCoderInput] LearnerCoderInput] LearnerCoderInput] LearnerCoderInput]

Predict Inputs: X: [1x1 LearnerCoderInput]

32-501

32

Functions

Code Generation Parameters: NumOutputs: 1 OutputFileName: 'ClassificationSVMModel' Properties, Methods

configurer is a ClassificationSVMCoderConfigurer object, which is a coder configurer of a ClassificationSVM object. To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Generate code for the predict and update functions of the SVM classification model (Mdl) with default settings. generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationSVMModel.mat'

The generateCode function completes these actions: • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named ClassificationSVMModel for the two entry-point functions. • Create the code for the MEX function in the codegen\mex\ClassificationSVMModel folder. • Copy the MEX function to the current folder. Display the contents of the predict.m, update.m, and initialize.m files by using the type function. type predict.m function varargout = predict(X,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:17:24 [varargout{1:nargout}] = initialize('predict',X,varargin{:}); end type update.m function update(varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:17:24 initialize('update',varargin{:}); end type initialize.m function [varargout] = initialize(command,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:17:24 coder.inline('always') persistent model if isempty(model) model = loadLearnerForCoder('ClassificationSVMModel.mat'); end

32-502

ClassificationSVMCoderConfigurer

switch(command) case 'update' % Update struct fields: Alpha % SupportVectors % SupportVectorLabels % Scale % Bias % Prior % Cost model = update(model,varargin{:}); case 'predict' % Predict Inputs: X X = varargin{1}; if nargin == 2 [varargout{1:nargout}] = predict(model,X); else PVPairs = cell(1,nargin-2); for i = 1:nargin-2 PVPairs{1,i} = varargin{i+1}; end [varargout{1:nargout}] = predict(model,X,PVPairs{:}); end end end

Update Parameters of SVM Classification Model in Generated Code Train a SVM model using a partial data set and create a coder configurer for the model. Use the properties of the coder configurer to specify coder attributes of the SVM model parameters. Use the object function of the coder configurer to generate C code that predicts labels for new predictor data. Then retrain the model using the whole data set and update parameters in the generated code without regenerating the code. Train Model Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g'). Train a binary SVM classification model using the first 50 observations. load ionosphere Mdl = fitcsvm(X(1:50,:),Y(1:50));

Mdl is a ClassificationSVM object. Create Coder Configurer Create a coder configurer for the ClassificationSVM model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input. Also, set the number of outputs to 2 so that the generated code returns predicted labels and scores. configurer = learnerCoderConfigurer(Mdl,X(1:50,:),'NumOutputs',2);

configurer is a ClassificationSVMCoderConfigurer object, which is a coder configurer of a ClassificationSVM object. 32-503

32

Functions

Specify Coder Attributes of Parameters Specify the coder attributes of the SVM classification model parameters so that you can update the parameters in the generated code after retraining the model. This example specifies the coder attributes of predictor data that you want to pass to the generated code and the coder attributes of the support vectors of the SVM model. First, specify the coder attributes of X so that the generated code accepts any number of observations. Modify the SizeVector and VariableDimensions attributes. The SizeVector attribute specifies the upper bound of the predictor data size, and the VariableDimensions attribute specifies whether each dimension of the predictor data has a variable size or fixed size. configurer.X.SizeVector = [Inf 34]; configurer.X.VariableDimensions = [true false];

The size of the first dimension is the number of observations. In this case, the code specifies that the upper bound of the size is Inf and the size is variable, meaning that X can have any number of observations. This specification is convenient if you do not know the number of observations when generating code. The size of the second dimension is the number of predictor variables. This value must be fixed for a machine learning model. X contains 34 predictors, so the value of the SizeVector attribute must be 34 and the value of the VariableDimensions attribute must be false. If you retrain the SVM model using new data or different settings, the number of support vectors can vary. Therefore, specify the coder attributes of SupportVectors so that you can update the support vectors in the generated code. configurer.SupportVectors.SizeVector = [250 34];

SizeVector attribute for Alpha has been modified to satisfy configuration constraints. SizeVector attribute for SupportVectorLabels has been modified to satisfy configuration constrain configurer.SupportVectors.VariableDimensions = [true false];

VariableDimensions attribute for Alpha has been modified to satisfy configuration constraints. VariableDimensions attribute for SupportVectorLabels has been modified to satisfy configuration c

If you modify the coder attributes of SupportVectors, then the software modifies the coder attributes of Alpha and SupportVectorLabels to satisfy configuration constraints. If the modification of the coder attributes of one parameter requires subsequent changes to other dependent parameters to satisfy configuration constraints, then the software changes the coder attributes of the dependent parameters. Generate Code To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Use generateCode to generate code for the predict and update functions of the SVM classification model (Mdl) with default settings. generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationSVMModel.mat'

32-504

ClassificationSVMCoderConfigurer

generateCode generates the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. Then generateCode creates a MEX function named ClassificationSVMModel for the two entrypoint functions in the codegen\mex\ClassificationSVMModel folder and copies the MEX function to the current folder. Verify Generated Code Pass some predictor data to verify whether the predict function of Mdl and the predict function in the MEX function return the same labels. To call an entry-point function in a MEX function that has more than one entry point, specify the function name as the first input argument. [label,score] = predict(Mdl,X); [label_mex,score_mex] = ClassificationSVMModel('predict',X);

Compare label and label_mex by using isequal. isequal(label,label_mex) ans = logical 1

isequal returns logical 1 (true) if all the inputs are equal. The comparison confirms that the predict function of Mdl and the predict function in the MEX function return the same labels. score_mex might include round-off differences compared with score. In this case, compare score_mex and score, allowing a small tolerance. find(abs(score-score_mex) > 1e-8) ans = 0x1 empty double column vector

The comparison confirms that score and score_mex are equal within the tolerance 1e–8. Retrain Model and Update Parameters in Generated Code Retrain the model using the entire data set. retrainedMdl = fitcsvm(X,Y);

Extract parameters to update by using validatedUpdateInputs. This function detects the modified model parameters in retrainedMdl and validates whether the modified parameter values satisfy the coder attributes of the parameters. params = validatedUpdateInputs(configurer,retrainedMdl);

Update parameters in the generated code. ClassificationSVMModel('update',params)

Verify Generated Code Compare the outputs from the predict function of retrainedMdl and the predict function in the updated MEX function. 32-505

32

Functions

[label,score] = predict(retrainedMdl,X); [label_mex,score_mex] = ClassificationSVMModel('predict',X); isequal(label,label_mex) ans = logical 1 find(abs(score-score_mex) > 1e-8) ans = 0x1 empty double column vector

The comparison confirms that labels and labels_mex are equal, and the score values are equal within the tolerance.

More About LearnerCoderInput Object A coder configurer uses a LearnerCoderInput object to specify the coder attributes of predict and update input arguments. A LearnerCoderInput object has the following attributes to specify the properties of an input argument array in the generated code. Attribute Name

Description

SizeVector

Array size if the corresponding VariableDimensions value is false. Upper bound of the array size if the corresponding VariableDimensions value is true. To allow an unbounded array, specify the bound as Inf.

VariableDimensions

Indicator specifying whether each dimension of the array has a variable size or fixed size, specified as true (logical 1) or false (logical 0): • A value of true (logical 1) means that the corresponding dimension has a variable size. • A value of false (logical 0) means that the corresponding dimension has a fixed size.

DataType

Data type of the array

Tunability

Indicator specifying whether or not predict or update includes the argument as an input in the generated code, specified as true (logical 1) or false (logical 0). If you specify other attribute values when Tunability is false, the software sets Tunability to true.

After creating a coder configurer, you can modify the coder attributes by using dot notation. For example, specify the coder attributes of the coefficients Alpha of the coder configurer configurer as follows: 32-506

ClassificationSVMCoderConfigurer

configurer.Alpha.SizeVector = [100 1]; configurer.Alpha.VariableDimensions = [1 0]; configurer.Alpha.DataType = 'double';

If you specify the verbosity level (Verbose) as true (default), then the software displays notification messages when you modify the coder attributes of a machine learning model parameter and the modification changes the coder attributes of other dependent parameters.

See Also ClassificationECOCCoderConfigurer | ClassificationSVM | CompactClassificationSVM | learnerCoderConfigurer | predict | update Topics “Introduction to Code Generation” on page 31-2 “Code Generation for Prediction and Update Using Coder Configurer” on page 31-68 Introduced in R2018b

32-507

32

Functions

ClassificationTree class Superclasses: CompactClassificationTree Binary decision tree for multiclass classification

Description A ClassificationTree object represents a decision tree with binary splits for classification. An object of this class can predict responses for new data using the predict method. The object contains the data used for training, so it can also compute resubstitution predictions.

Construction Create a ClassificationTree object by using fitctree.

Properties BinEdges Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the 'NumBins' name-value pair argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default). You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl. X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the discretize function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end

32-508

ClassificationTree class

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs. CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). CategoricalSplit An n-by-2 cell array, where n is the number of categorical splits in tree. Each row in CategoricalSplits gives left and right values for a categorical split. For each branch node with categorical split j based on a categorical predictor variable z, the left child is chosen if z is in CategoricalSplits(j,1) and the right child is chosen if z is in CategoricalSplits(j,2). The splits are in the same order as nodes of the tree. Find the nodes for these splits by selecting 'categorical' cuts from top to bottom in the CutType property. Children An n-by-2 array containing the numbers of the child nodes for each node in tree, where n is the number of nodes. Leaf nodes have child node 0. ClassCount An n-by-k array of class counts for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class counts ClassCount(i,:) are counts of observations (from the data used in fitting the tree) from each class satisfying the conditions for node i. ClassNames List of the elements in Y with duplicates removed. ClassNames can be a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) ClassProbability An n-by-k array of class probabilities for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class probabilities ClassProbability(i,:) are the estimated probabilities for each class for a point satisfying the conditions for node i. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is readonly. CutCategories An n-by-2 cell array of the categories used at branches in tree, where n is the number of nodes. For each branch node i based on a categorical predictor variable X, the left child is chosen if X is among 32-509

32

Functions

the categories listed in CutCategories{i,1}, and the right child is chosen if X is among those listed in CutCategories{i,2}. Both columns of CutCategories are empty for branch nodes based on continuous predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPoint An n-element vector of the values used as cut points in tree, where n is the number of nodes. For each branch node i based on a continuous predictor variable X, the left child is chosen if X=CutPoint(i). CutPoint is NaN for branch nodes based on categorical predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutType An n-element cell array indicating the type of cut at each node in tree, where n is the number of nodes. For each node i, CutType{i} is: • 'continuous' — If the cut is defined in the form X < v for a variable X and cut point v. • 'categorical' — If the cut is defined by whether a variable X takes a value in a set of categories. • '' — If i is a leaf node. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPredictor An n-element cell array of the names of the variables used for branching in each node in tree, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, CutPredictor contains an empty character vector. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPredictorIndex An n-element array of numeric indices for the variables used for branching in each node in tree, where n is the number of nodes. For more information, see CutPredictor. ExpandedPredictorNames Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. HyperparameterOptimizationResults Description of the cross-validation optimization of hyperparameters, stored as a BayesianOptimization object or a table of hyperparameters and associated values. Nonempty 32-510

ClassificationTree class

when the OptimizeHyperparameters name-value pair is nonempty at creation. Value depends on the setting of the HyperparameterOptimizationOptions name-value pair at creation: • 'bayesopt' (default) — Object of class BayesianOptimization • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst) IsBranchNode An n-element logical vector that is true for each branch node and false for each leaf node of tree. ModelParameters Parameters used in training tree. To display all parameter values, enter tree.ModelParameters. To access a particular parameter, use dot notation. NumObservations Number of observations in the training data, a numeric scalar. NumObservations can be less than the number of rows of input data X when there are missing values in X or response Y. NodeClass An n-element cell array with the names of the most probable classes in each node of tree, where n is the number of nodes in the tree. Every element of this array is a character vector equal to one of the class names in ClassNames. NodeError An n-element vector of the errors of the nodes in tree, where n is the number of nodes. NodeError(i) is the misclassification probability for node i. NodeProbability An n-element vector of the probabilities of the nodes in tree, where n is the number of nodes. The probability of a node is computed as the proportion of observations from the original data that satisfy the conditions for the node. This proportion is adjusted for any prior probabilities assigned to each class. NodeRisk An n-element vector of the risk of the nodes in the tree, where n is the number of nodes. The risk for each node is the measure of impurity (Gini index or deviance) for this node weighted by the node probability. If the tree is grown by twoing, the risk for each node is zero. NodeSize An n-element vector of the sizes of the nodes in tree, where n is the number of nodes. The size of a node is defined as the number of observations from the data used to create the tree that satisfy the conditions for the node. NumNodes The number of nodes in tree. 32-511

32

Functions

Parent An n-element vector containing the number of the parent node for each node in tree, where n is the number of nodes. The parent of the root node is 0. PredictorNames Cell array of character vectors containing the predictor names, in the order which they appear in X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only. PruneAlpha Numeric vector with one element per pruning level. If the pruning level ranges from 0 to M, then PruneAlpha has M + 1 elements sorted in ascending order. PruneAlpha(1) is for pruning level 0 (no pruning), PruneAlpha(2) is for pruning level 1, and so on. PruneList An n-element numeric vector with the pruning levels in each node of tree, where n is the number of nodes. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node. ResponseName A character vector that specifies the name of the response variable (Y). RowsUsed An n-element logical vector indicating which rows of the original predictor data (X) were used in fitting. If the software uses all rows of X, then RowsUsed is an empty array ([]). ScoreTransform Function handle for transforming predicted classification scores, or character vector representing a built-in transformation function. none means no transformation, or @(x)x. To change the score transformation function to, for example, function, use dot notation. • For available functions (see fitctree), enter Mdl.ScoreTransform = 'function';

• You can set a function handle for an available function, or a function you define yourself by entering tree.ScoreTransform = @function;

SurrogateCutCategories An n-element cell array of the categories used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogateCutCategories{k} is a cell array. The length of 32-512

ClassificationTree class

SurrogateCutCategories{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutCategories{k} is either an empty character vector for a continuous surrogate predictor, or is a two-element cell array with categories for a categorical surrogate predictor. The first element of this two-element cell array lists categories assigned to the left child by this surrogate split, and the second element of this two-element cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutCategories contains an empty cell. SurrogateCutFlip An n-element cell array of the numeric cut assignments used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogateCutFlip{k} is a numeric vector. The length of SurrogateCutFlip{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutFlip{k} is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

• For a MATLAB function or a function that you define, enter its function handle. Mdl.ScoreTransform = @function;

function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Data Types: char | function_handle 32-696

CompactClassificationSVM

Sigma — Predictor standard deviations [] (default) | numeric vector This property is read-only. Predictor standard deviations, specified as a numeric vector. If you specify 'Standardize',true when you train the SVM classifier using fitcsvm, then the length of Sigma is equal to the number of predictor variables. MATLAB expands categorical variables in the predictor data using full dummy encoding. That is, MATLAB creates one dummy variable for each level of each categorical variable. Sigma stores one value for each predictor variable, including the dummy variables. However, MATLAB does not standardize the columns that contain categorical variables. If you set 'Standardize',false when you train the SVM classifier using fitcsvm, then Sigma is an empty vector ([]). Data Types: single | double

Object Functions compareHoldout discardSupportVectors edge fitPosterior loss margin predict update

Compare accuracies of two classification models using new data Discard support vectors for linear support vector machine (SVM) classifier Find classification edge for support vector machine (SVM) classifier Fit posterior probabilities for compact support vector machine (SVM) classifier Find classification error for support vector machine (SVM) classifier Find classification margins for support vector machine (SVM) classifier Classify observations using support vector machine (SVM) classifier Update model parameters for code generation

Examples Reduce Size of SVM Classifiers Reduce the size of a full SVM classifier by removing the training data. Full SVM classifiers (that is, ClassificationSVM classifiers) hold the training data. To improve efficiency, use a smaller classifier. Load the ionosphere data set. load ionosphere

Train an SVM classifier. Standardize the predictor data and specify the order of the classes. SVMModel = fitcsvm(X,Y,'Standardize',true,... 'ClassNames',{'b','g'}) SVMModel = ClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform:

'Y' [] {'b' 'g'} 'none'

32-697

32

Functions

NumObservations: Alpha: Bias: KernelParameters: Mu: Sigma: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

351 [90x1 double] -0.1343 [1x1 struct] [1x34 double] [1x34 double] [351x1 double] [1x1 struct] [351x1 logical] 'SMO'

Properties, Methods

SVMModel is a ClassificationSVM classifier. Reduce the size of the SVM classifier. CompactSVMModel = compact(SVMModel) CompactSVMModel = classreg.learning.classif.CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [90x1 double] Bias: -0.1343 KernelParameters: [1x1 struct] Mu: [1x34 double] Sigma: [1x34 double] SupportVectors: [90x34 double] SupportVectorLabels: [90x1 double] Properties, Methods

CompactSVMModel is a CompactClassificationSVM classifier. Display the amount of memory each classifier uses. whos('SVMModel','CompactSVMModel') Name

Size

CompactSVMModel SVMModel

1x1 1x1

Bytes 30890 140980

Class

classreg.learning.classif.CompactClassificationSVM ClassificationSVM

The full SVM classifier (SVMModel) is more than four times larger than the compact SVM classifier (CompactSVMModel). To label new observations efficiently, you can remove SVMModel from the MATLAB® Workspace, and then pass CompactSVMModel and new predictor values to predict. To further reduce the size of your compact SVM classifier, use the discardSupportVectors function to discard support vectors. 32-698

CompactClassificationSVM

Train and Cross-Validate SVM Classifier Load the ionosphere data set. load ionosphere

Train and cross-validate an SVM classifier. Standardize the predictor data and specify the order of the classes. rng(1); % For reproducibility CVSVMModel = fitcsvm(X,Y,'Standardize',true,... 'ClassNames',{'b','g'},'CrossVal','on') CVSVMModel = classreg.learning.partition.ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {1x34 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

CVSVMModel is a ClassificationPartitionedModel cross-validated SVM classifier. By default, the software implements 10-fold cross-validation. Alternatively, you can cross-validate a trained ClassificationSVM classifier by passing it to crossval. Inspect one of the trained folds using dot notation. CVSVMModel.Trained{1} ans = classreg.learning.classif.CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78x1 double] Bias: -0.2209 KernelParameters: [1x1 struct] Mu: [1x34 double] Sigma: [1x34 double] SupportVectors: [78x34 double] SupportVectorLabels: [78x1 double] Properties, Methods

32-699

32

Functions

Each fold is a CompactClassificationSVM classifier trained on 90% of the data. Estimate the generalization error. genError = kfoldLoss(CVSVMModel) genError = 0.1168

On average, the generalization error is approximately 12%.

References [1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008. [2] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. “Estimating the Support of a High-Dimensional Distribution.” Neural Computation. Vol. 13, Number 7, 2001, pp. 1443–1471. [3] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000. [4] Scholkopf, B., and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning. Cambridge, MA: The MIT Press, 2002.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict and update functions support code generation. • When you train an SVM model by using fitcsvm, the following restrictions apply. • The class labels input argument value (Y) cannot be a categorical array. • Code generation does not support categorical predictors (logical, categorical, char, string, or cell). If you supply training data in a table, the predictors must be numeric (double or single). Also, you cannot use the 'CategoricalPredictors' name-value pair argument. To include categorical predictors in a model, preprocess the categorical predictors by using dummyvar before fitting the model. • The value of the 'ClassNames' name-value pair argument cannot be a categorical array. • The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function. For generating code that predicts posterior probabilities given new observations, pass a trained SVM model to fitPosterior or fitSVMPosterior. The ScoreTransform property of the returned model contains an anonymous function that represents the score-toposterior-probability function and is configured for code generation. • For fixed-point code generation, the value of the 'ScoreTransform' name-value pair argument cannot be 'invlogit'. For more information, see “Introduction to Code Generation” on page 31-2. 32-700

CompactClassificationSVM

See Also ClassificationSVM | compact | discardSupportVectors | fitcsvm Topics Using Support Vector Machines on page 18-149 Understanding Support Vector Machines on page 18-145 Introduced in R2014a

32-701

32

Functions

CompactClassificationTree Package: classreg.learning.classif Compact classification tree

Description Compact version of a classification tree (of class ClassificationTree). The compact version does not include the data for training the classification tree. Therefore, you cannot perform some tasks with a compact classification tree, such as cross validation. Use a compact classification tree for making predictions (classifications) of new data.

Construction ctree = compact(tree) constructs a compact decision tree from a full decision tree. Input Arguments tree A decision tree constructed using fitctree.

Properties CategoricalPredictors Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]). CategoricalSplits An n-by-2 cell array, where n is the number of categorical splits in tree. Each row in CategoricalSplits gives left and right values for a categorical split. For each branch node with categorical split j based on a categorical predictor variable z, the left child is chosen if z is in CategoricalSplits(j,1) and the right child is chosen if z is in CategoricalSplits(j,2). The splits are in the same order as nodes of the tree. Find the nodes for these splits by selecting 'categorical' cuts from top to bottom in the CutType property. Children An n-by-2 array containing the numbers of the child nodes for each node in tree, where n is the number of nodes. Leaf nodes have child node 0. ClassCount An n-by-k array of class counts for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class counts ClassCount(i,:) are counts of observations (from the data used in fitting the tree) from each class satisfying the conditions for node i. 32-702

CompactClassificationTree

ClassNames List of the elements in Y with duplicates removed. ClassNames can be a numeric vector, vector of categorical variables, logical vector, character array, or cell array of character vectors. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.) If the value of a property has at least one dimension of length k, then ClassNames indicates the order of the elements along that dimension (e.g., Cost and Prior). ClassProbability An n-by-k array of class probabilities for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class probabilities ClassProbability(i,:) are the estimated probabilities for each class for a point satisfying the conditions for node i. Cost Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response. This property is readonly. CutCategories An n-by-2 cell array of the categories used at branches in tree, where n is the number of nodes. For each branch node i based on a categorical predictor variable x, the left child is chosen if x is among the categories listed in CutCategories{i,1}, and the right child is chosen if x is among those listed in CutCategories{i,2}. Both columns of CutCategories are empty for branch nodes based on continuous predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPoint An n-element vector of the values used as cut points in tree, where n is the number of nodes. For each branch node i based on a continuous predictor variable x, the left child is chosen if x=CutPoint(i). CutPoint is NaN for branch nodes based on categorical predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutType An n-element cell array indicating the type of cut at each node in tree, where n is the number of nodes. For each node i, CutType{i} is: • 'continuous' — If the cut is defined in the form x < v for a variable x and cut point v. • 'categorical' — If the cut is defined by whether a variable x takes a value in a set of categories. • '' — If i is a leaf node. 32-703

32

Functions

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPredictor An n-element cell array of the names of the variables used for branching in each node in tree, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, CutPredictor contains an empty character vector. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. CutPredictorIndex An n-element array of numeric indices for the variables used for branching in each node in tree, where n is the number of nodes. For more information, see CutPredictor. ExpandedPredictorNames Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames. IsBranchNode An n-element logical vector that is true for each branch node and false for each leaf node of tree. NodeClass An n-element cell array with the names of the most probable classes in each node of tree, where n is the number of nodes in the tree. Every element of this array is a character vector equal to one of the class names in ClassNames. NodeError An n-element vector of the errors of the nodes in tree, where n is the number of nodes. NodeError(i) is the misclassification probability for node i. NodeProbability An n-element vector of the probabilities of the nodes in tree, where n is the number of nodes. The probability of a node is computed as the proportion of observations from the original data that satisfy the conditions for the node. This proportion is adjusted for any prior probabilities assigned to each class. NodeRisk An n-element vector of the risk of the nodes in the tree, where n is the number of nodes. The risk for each node is the measure of impurity (Gini index or deviance) for this node weighted by the node probability. If the tree is grown by twoing, the risk for each node is zero.

32-704

CompactClassificationTree

NodeSize An n-element vector of the sizes of the nodes in tree, where n is the number of nodes. The size of a node is defined as the number of observations from the data used to create the tree that satisfy the conditions for the node. NumNodes The number of nodes in tree. Parent An n-element vector containing the number of the parent node for each node in tree, where n is the number of nodes. The parent of the root node is 0. PredictorNames A cell array of names for the predictor variables, in the order in which they appear in X. Prior Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the order of the classes in ClassNames. The number of elements of Prior is the number of unique classes in the response. This property is read-only. PruneAlpha Numeric vector with one element per pruning level. If the pruning level ranges from 0 to M, then PruneAlpha has M + 1 elements sorted in ascending order. PruneAlpha(1) is for pruning level 0 (no pruning), PruneAlpha(2) is for pruning level 1, and so on. PruneList An n-element numeric vector with the pruning levels in each node of tree, where n is the number of nodes. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node. ResponseName Character vector describing the response variable Y. ScoreTransform Function handle for transforming scores, or character vector representing a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree. Add or change a ScoreTransform function using dot notation: ctree.ScoreTransform = 'function' or ctree.ScoreTransform = @function

SurrogateCutCategories An n-element cell array of the categories used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogateCutCategories{k} is a cell array. The length of 32-705

32

Functions

SurrogateCutCategories{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutCategories{k} is either an empty character vector for a continuous surrogate predictor, or is a two-element cell array with categories for a categorical surrogate predictor. The first element of this two-element cell array lists categories assigned to the left child by this surrogate split and the second element of this two-element cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutVar. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutCategories contains an empty cell. SurrogateCutFlip An n-element cell array of the numeric cut assignments used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrSurrogateCutFlip{k} is a numeric vector. The length of SurrogateCutFlip{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutFlip{k} is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z 0.5); end

32-917

32

Functions

Determine Number of Clusters Using Cross-Validation For a given number of clusters, compute the cross-validated sum of squared distances between observations and their nearest cluster center. Compare the results for one through ten clusters. Load the fisheriris data set. X is the matrix meas, which contains flower measurements for 150 different flowers. load fisheriris X = meas;

Create the custom function clustf (shown at the end of this example). This function performs the following steps: 1

Standardize the training data.

2

Separate the training data into k clusters.

3

Transform the test data using the training data mean and standard deviation.

4

Compute the distance from each test data point to the nearest cluster center, or centroid.

5

Compute the sum of the squares of the distances.

Note: If you use the live script file for this example, the clustf function is already included at the end of the file. Otherwise, you need to create the function at the end of your .m file or add it as a file on the MATLAB® path. Create a for loop that specifies the number of clusters k for each iteration. For each fixed number of clusters, pass the corresponding clustf function to crossval. Because crossval performs 10-fold cross-validation by default, the software computes 10 sums of squared distances, one for each partition of training and test data. Take the sum of those values; the result is the cross-validated sum of squared distances for the given number of clusters. rng('default') % For reproducibility cvdist = zeros(5,1); for k = 1:10 fun = @(Xtrain,Xtest)clustf(Xtrain,Xtest,k); distances = crossval(fun,X); cvdist(k) = sum(distances); end

Plot the cross-validated sum of squared distances for each number of clusters. plot(cvdist) xlabel('Number of Clusters') ylabel('CV Sum of Squared Distances')

32-918

crossval

In general, when determining how many clusters to use, consider the greatest number of clusters that corresponds to a significant decrease in the cross-validated sum of squared distances. For this example, using two or three clusters seems appropriate, but using more than three clusters does not. This code creates the function clustf. function distances = clustf(Xtrain,Xtest,k) [Ztrain,Zmean,Zstd] = zscore(Xtrain); [~,C] = kmeans(Ztrain,k); % Creates k clusters Ztest = (Xtest-Zmean)./Zstd; d = pdist2(C,Ztest,'euclidean','Smallest',1); distances = sum(d.^2); end

Compute Mean Absolute Error Using Cross-Validation Compute the mean absolute error of a regression model by using 10-fold cross-validation. Load the carsmall data set. Specify the Acceleration and Displacement variables as predictors and the Weight variable as the response. load carsmall X1 = Acceleration; X2 = Displacement; y = Weight;

32-919

32

Functions

Create the custom function regf (shown at the end of this example). This function fits a regression model to training data and then computes predicted car weights on a test set. The function compares the predicted car weight values to the true values, and then computes the mean absolute error (MAE) and the MAE adjusted to the range of the test set car weights. Note: If you use the live script file for this example, the regf function is already included at the end of the file. Otherwise, you need to create this function at the end of your .m file or add it as a file on the MATLAB® path. By default, crossval performs 10-fold cross-validation. For each of the 10 training and test set partitions of the data in X1, X2, and y, compute the MAE and adjusted MAE values using the regf function. Find the mean MAE and mean adjusted MAE. rng('default') % For reproducibility values = crossval(@regf,X1,X2,y) values = 10×2 319.2261 342.3722 214.3735 174.7247 189.4835 249.4359 194.4210 348.7437 283.1761 210.7444

0.1132 0.1240 0.0902 0.1128 0.0832 0.1003 0.0845 0.1700 0.1187 0.1325

mean(values) ans = 1×2 252.6701

0.1129

This code creates the function regf. function errors = regf(X1train,X2train,ytrain,X1test,X2test,ytest) tbltrain = table(X1train,X2train,ytrain, ... 'VariableNames',{'Acceleration','Displacement','Weight'}); tbltest = table(X1test,X2test,ytest, ... 'VariableNames',{'Acceleration','Displacement','Weight'}); mdl = fitlm(tbltrain,'Weight ~ Acceleration + Displacement'); yfit = predict(mdl,tbltest); MAE = mean(abs(yfit-tbltest.Weight)); adjMAE = MAE/range(tbltest.Weight); errors = [MAE adjMAE]; end

Compute Misclassification Error Using PCA and Cross-Validation Compute the misclassification error of a classification tree by using principal component analysis (PCA) and 5-fold cross-validation. 32-920

crossval

Load the fisheriris data set. The meas matrix contains flower measurements for 150 different flowers. The species variable lists the species for each flower. load fisheriris

Create the custom function classf (shown at the end of this example). This function fits a classification tree to training data and then classifies test data. Use PCA inside the function to reduce the number of predictors used to create the tree model. Note: If you use the live script file for this example, the classf function is already included at the end of the file. Otherwise, you need to create this function at the end of your .m file or add it as a file on the MATLAB® path. Create a cvpartition object for stratified 5-fold cross-validation. By default, cvpartition ensures that training and test sets have roughly the same proportions of flower species. rng('default') % For reproducibility cvp = cvpartition(species,'KFold',5);

Compute the 5-fold cross-validation misclassification error for the classification tree with predictor data meas and response variable species. cvError = crossval('mcr',meas,species,'Predfun',@classf,'Partition',cvp) cvError = 0.1067

This code creates the function classf. function yfit = classf(Xtrain,ytrain,Xtest) % Standardize the training predictor data. Then, find the % principal components for the standardized training predictor % data. [Ztrain,Zmean,Zstd] = zscore(Xtrain); [coeff,scoreTrain,~,~,explained,mu] = pca(Ztrain); % Find the lowest number of principal components that account % for at least 95% of the variability. n = find(cumsum(explained)>=95,1); % Find the n principal component scores for the standardized % training predictor data. Train a classification tree model % using only these scores. scoreTrain95 = scoreTrain(:,1:n); mdl = fitctree(scoreTrain95,ytrain); % Find the n principal component scores for the transformed % test data. Classify the test data. Ztest = (Xtest-Zmean)./Zstd; scoreTest95 = (Ztest-mu)*coeff(:,1:n); yfit = predict(mdl,scoreTest95); end

32-921

32

Functions

Create Confusion Matrix Using Cross-Validation Create a confusion matrix from the 10-fold cross-validation results of a discriminant analysis model. Note: Use classify when training speed is a concern. Otherwise, use fitcdiscr to create a discriminant analysis model. For an example that shows the same workflow as this example, but uses fitcdiscr, see “Create Confusion Matrix Using Cross-Validation Predictions” on page 32-2870. Load the fisheriris data set. X contains flower measurements for 150 different flowers, and y lists the species for each flower. Create a variable order that specifies the order of the flower species. load fisheriris X = meas; y = species; order = unique(y) order = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }

Create a function handle named func for a function that completes the following steps: • Take in training data (Xtrain and ytrain) and test data (Xtest and ytest). • Use the training data to create a discriminant analysis model that classifies new data (Xtest). Create this model and classify new data by using the classify function. • Compare the true test data classes (ytest) to the predicted test data values, and create a confusion matrix of the results by using the confusionmat function. Specify the class order by using 'Order',order. func = @(Xtrain,ytrain,Xtest,ytest)confusionmat(ytest, ... classify(Xtest,Xtrain,ytrain),'Order',order);

Create a cvpartition object for stratified 10-fold cross-validation. By default, cvpartition ensures that training and test sets have roughly the same proportions of flower species. rng('default') % For reproducibility cvp = cvpartition(y,'Kfold',10);

Compute the 10 test set confusion matrices for each partition of the predictor data X and response variable y. Each row of confMat corresponds to the confusion matrix results for one test set. Aggregate the results and create the final confusion matrix cvMat. confMat = crossval(func,X,y,'Partition',cvp); cvMat = reshape(sum(confMat),3,3) cvMat = 3×3 50 0 0

0 48 1

0 2 49

Plot the confusion matrix as a confusion matrix chart by using confusionchart. confusionchart(cvMat,order)

32-922

crossval

Input Arguments criterion — Type of error estimate 'mse' | 'mcr' Type of error estimate, specified as either 'mse' or 'mcr'. Value

Description

'mse'

Mean squared error (MSE) — Appropriate for regression algorithms only

'mcr'

Misclassification rate, or proportion of misclassified observations — Appropriate for classification algorithms only

X — Data set column vector | matrix | array Data set, specified as a column vector, matrix, or array. The rows of X correspond to observations, and the columns of X generally correspond to variables. If you pass multiple data sets X1,...,XN to crossval, then all data sets must have the same number of rows. Data Types: single | double | logical | char | string | cell | categorical

32-923

32

Functions

y — Response data column vector | character array Response data, specified as a column vector or character array. The rows of y correspond to observations, and y must have the same number of rows as the predictor data X or X1,...,XN. Data Types: single | double | logical | char | string | cell | categorical predfun — Prediction function function handle Prediction function, specified as a function handle. You must create this function as an anonymous function, a function defined at the end of the .m or .mlx file containing the rest of your code, or a file on the MATLAB path. This table describes the required function syntax, given the type of predictor data passed to crossval. Value

Predictor Data

Function Syntax

@myfunction

X

function yfit = myfunction(Xtrain,ytrain,Xtest) % Calculate predicted response ... end

• Xtrain — Subset of the observations in X used as training predictor data. The function uses Xtrain and ytrain to construct a classification or regression model. • ytrain — Subset of the responses in y used as training response data. The rows of ytrain correspond to the same observations in the rows of Xtrain. The function uses Xtrain and ytrain to construct a classification or regression model. • Xtest — Subset of the observations in X used as test predictor data. The function uses Xtest and the model trained on Xtrain and ytrain to compute the predicted values yfit. • yfit — Set of predicted values for observations in Xtest. The yfit values form a column vector with the same number of rows as Xtest.

32-924

crossval

Value

Predictor Data

Function Syntax

@myfunction

X1,...,XN

function yfit = myfunction(X1train,...,XNtrain,ytrain % Calculate predicted response ... end

• X1train,...,XNtrain — Subsets of the predictor data in X1,...,XN, respectively, that are used as training predictor data. The rows of X1train,...,XNtrain correspond to the same observations. The function uses X1train,...,XNtrain and ytrain to construct a classification or regression model. • ytrain — Subset of the responses in y used as training response data. The rows of ytrain correspond to the same observations in the rows of X1train,...,XNtrain. The function uses X1train,...,XNtrain and ytrain to construct a classification or regression model. • X1test,...,XNtest — Subsets of the observations in X1,...,XN, respectively, that are used as test predictor data. The rows of X1test,...,XNtest correspond to the same observations. The function uses X1test,...,XNtest and the model trained on X1train,...,XNtrain and ytrain to compute the predicted values yfit. • yfit — Set of predicted values for observations in X1test,...,XNtest. The yfit values form a column vector with the same number of rows as X1test,...,XNtest. Example: @(Xtrain,ytrain,Xtest)(Xtest*regress(ytrain,Xtrain)); Data Types: function_handle fun — Function to cross-validate function handle Function to cross-validate, specified as a function handle. You must create this function as an anonymous function, a function defined at the end of the .m or .mlx file containing the rest of your code, or a file on the MATLAB path. This table describes the required function syntax, given the type of data passed to crossval.

32-925

32

Functions

Value

Data

Function Syntax

@myfunction

X

function value = myfunction(Xtrain,Xtest) % Calculation of value ... end

• Xtrain — Subset of the observations in X used as training data. The function uses Xtrain to construct a model. • Xtest — Subset of the observations in X used as test data. The function uses Xtest and the model trained on Xtrain to compute value. • value — Quantity or variable. In most cases, value is a numeric scalar representing a loss estimate. value can also be an array, provided that the array size is the same for each partition of training and test data. If you want to return a variable output that can change size depending on the data partition, set value to be the cell scalar {output} instead. @myfunction

X1,...,XN

function value = myfunction(X1train,...,XNtrain,X1tes % Calculation of value ... end

• X1train,...,XNtrain — Subsets of the data in X1,...,XN, respectively, that are used as training data. The rows of X1train,...,XNtrain correspond to the same observations. The function uses X1train,...,XNtrain to construct a model. • X1test,...,XNtest — Subsets of the data in X1,...,XN, respectively, that are used as test data. The rows of X1test,...,XNtest correspond to the same observations. The function uses X1test,...,XNtest and the model trained on X1train,...,XNtrain to compute value. • value — Quantity or variable. In most cases, value is a numeric scalar representing a loss estimate. value can also be an array, provided that the array size is the same for each partition of training and test data. If you want to return a variable output that can change size depending on the data partition, set value to be the cell scalar {output} instead. Data Types: function_handle

32-926

crossval

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: crossval('mcr',meas,species,'Predfun',@classf,'KFold',5,'Stratify',species) specifies to compute the stratified 5-fold cross-validation misclassification rate for the classf function with predictor data meas and response variable species. Holdout — Fraction or number of observations used for holdout validation [] (default) | scalar value in the range (0,1) | positive integer scalar Fraction or number of observations used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1) or a positive integer scalar. • If the Holdout value p is a scalar in the range (0,1), then crossval randomly selects and reserves approximately p*100% of the observations as test data. • If the Holdout value p is a positive integer scalar, then crossval randomly selects and reserves p observations as test data. In either case, crossval then trains the model specified by either fun or predfun using the rest of the data. Finally, the function uses the test data along with the trained model to compute either values or err. You can use only one of these four name-value pair arguments: Holdout, KFold, Leaveout, and Partition. Example: 'Holdout',0.3 Example: 'Holdout',50 Data Types: single | double KFold — Number of folds 10 (default) | positive integer scalar greater than 1 Number of folds for k-fold cross-validation, specified as the comma-separated pair consisting of 'KFold' and a positive integer scalar greater than 1. If you specify 'KFold',k, then crossval randomly partitions the data into k sets. For each set, the function reserves the set as test data, and trains the model specified by either fun or predfun using the other k – 1 sets. crossval then uses the test data along with the trained model to compute either values or err. You can use only one of these four name-value pair arguments: Holdout, KFold, Leaveout, and Partition. Example: 'KFold',5 Data Types: single | double Leaveout — Leave-one-out cross-validation [] (default) | 1 32-927

32

Functions

Leave-one-out cross-validation, specified as the comma-separated pair consisting of 'Leaveout' and 1. If you specify 'Leaveout',1, then for each observation, crossval reserves the observation as test data, and trains the model specified by either fun or predfun using the other observations. The function then uses the test observation along with the trained model to compute either values or err. You can use only one of these four name-value pair arguments: Holdout, KFold, Leaveout, and Partition. Example: 'Leaveout',1 Data Types: single | double MCReps — Number of Monte Carlo repetitions 1 (default) | positive integer scalar Number of Monte Carlo repetitions for validation, specified as the comma-separated pair consisting of 'MCReps' and a positive integer scalar. If the first input of crossval is 'mse' or 'mcr' (see criterion), then crossval returns the mean MSE or misclassification rate across all Monte Carlo repetitions. Otherwise, crossval concatenates the values from all Monte Carlo repetitions along the first dimension. If you specify both Partition and MCReps, then the first Monte Carlo repetition uses the partition information in the cvpartition object, and the software calls the repartition object function to generate new partitions for each of the remaining repetitions. Example: 'MCReps',5 Data Types: single | double Partition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'Partition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and test sets. When you use crossval, you cannot specify both Partition and Stratify. Instead, directly specify a stratified partition when you create the cvpartition partition object. You can use only one of these four name-value pair arguments: Holdout, KFold, Leaveout, and Partition. Stratify — Variable specifying groups used for stratification column vector Variable specifying the groups used for stratification, specified as the comma-separated pair consisting of 'Stratify' and a column vector with the same number of rows as the data X or X1,...,XN. When you specify Stratify, both the training and test sets have roughly the same class proportions as in the Stratify vector. The software treats NaNs, empty character vectors, empty strings, values, and values in Stratify as missing data values, and ignores the corresponding rows of the data. 32-928

crossval

A good practice is to use stratification when you use cross-validation with classification algorithms. Otherwise, some test sets might not include observations for all classes. When you use crossval, you cannot specify both Partition and Stratify. Instead, directly specify a stratified partition when you create the cvpartition partition object. Data Types: single | double | logical | string | cell | categorical Options — Options for running in parallel and setting random streams structure Options for running computations in parallel and setting random streams, specified as the commaseparated pair consisting of 'Options' and a structure. Create the Options structure with statset. This table lists the option fields and their values. Field Name

Value

Default

UseParallel

Set this value to true to run computations in parallel.

false

UseSubstreams

Set this value to true to run computations in parallel in a reproducible manner.

false

To compute reproducibly, set Streams to a type that allows substreams: 'mlfg6331_64' or 'mrg32k3a'. Streams

Specify this value as a RandStream object or a cell array consisting of one such object.

If you do not specify Streams, then crossval uses the default stream.

Note You need Parallel Computing Toolbox to run computations in parallel. Example: 'Options',statset('UseParallel',true) Data Types: struct

Output Arguments err — Mean squared error or misclassification rate numeric scalar Mean squared error or misclassification rate, returned as a numeric scalar. The type of error depends on the criterion value. values — Loss values column vector | matrix Loss values, returned as a column vector or matrix. Each row of values corresponds to the output of fun for one partition of training and test data. 32-929

32

Functions

If the output returned by fun is multidimensional, then crossval reshapes the output and fits it into one row of values. For an example, see “Create Confusion Matrix Using Cross-Validation” on page 32-921.

Tips • A good practice is to use stratification (see Stratify) when you use cross-validation with classification algorithms. Otherwise, some test sets might not include observations for all classes.

Alternative Functionality Many classification and regression functions allow you to perform cross-validation directly. • When you use fit functions such as fitcsvm, fitctree, and fitrtree, you can specify crossvalidation options by using name-value pair arguments. Alternatively, you can first create models with these fit functions and then create a partitioned object by using the crossval object function. Use the kfoldLoss and kfoldPredict object functions to compute the loss and predicted values for the partitioned object. For more information, see ClassificationPartitionedModel and RegressionPartitionedModel. • You can also specify cross-validation options when you perform lasso or elastic net regularization using lasso and lassoglm.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also classify | confusionmat | cvpartition | kmeans | pca | regress Topics “Selecting Features for Classifying High-dimensional Data” Introduced in R2008a

32-930

crossval

crossval Class: ClassificationDiscriminant Cross-validated discriminant analysis classifier

Syntax cvmodel = crossval(obj) cvmodel = crossval(obj,Name,Value)

Description cvmodel = crossval(obj) creates a partitioned model from obj, a fitted discriminant analysis classifier. By default, crossval uses 10-fold cross validation on the training data to create cvmodel. cvmodel = crossval(obj,Name,Value) creates a partitioned model with additional options specified by one or more Name,Value pair arguments.

Input Arguments obj Discriminant analysis classifier, produced using fitcdiscr. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. CVPartition Object of class cvpartition, created by the cvpartition function. crossval splits the data into subsets with cvpartition. Use only one of these options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Default: [] Holdout Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. Specify a numeric scalar from 0 to 1. Use only one of these options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. KFold Number of folds to use in a cross-validated classifier, a positive integer value greater than 1. Use only one of these options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. 32-931

32

Functions

Default: 10 Leaveout Set to 'on' for leave-one-out cross validation. Use only one of these options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

Examples Create a classification model for the Fisher iris data, and then create a cross-validation model. Evaluate the quality the model using kfoldLoss. load fisheriris obj = fitcdiscr(meas,species); cvmodel = crossval(obj); L = kfoldLoss(cvmodel) L = 0.0200

Tips • Assess the predictive performance of obj on cross-validated data using the “kfold” methods and properties of cvmodel, such as kfoldLoss.

Alternatives You can create a cross-validation classifier directly from the data, instead of creating a discriminant analysis classifier followed by a cross-validation classifier. To do so, include one of these options in fitcdiscr: 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

See Also crossval | fitcdiscr | kfoldEdge | kfoldLoss | kfoldMargin | kfoldPredict | kfoldfun Topics “Discriminant Analysis Classification” on page 20-2

32-932

crossval

crossval Cross-validate multiclass error-correcting output codes (ECOC) model

Syntax CVMdl = crossval(Mdl) CVMdl = crossval(Mdl,Name,Value)

Description CVMdl = crossval(Mdl) returns a cross-validated (partitioned) multiclass error-correcting output codes (ECOC) model (CVMdl) from a trained ECOC model (Mdl). By default, crossval uses 10-fold cross-validation on the training data to create CVMdl, a ClassificationPartitionedECOC model. CVMdl = crossval(Mdl,Name,Value) returns a partitioned ECOC model with additional options specified by one or more name-value pair arguments. For example, you can specify the number of folds or a holdout sample proportion.

Examples Cross-Validate ECOC Classifier Cross-validate an ECOC classifier with SVM binary learners, and estimate the generalized classification error. Load Fisher's iris data set. Specify the predictor data X and the response data Y. load fisheriris X = meas; Y = species; rng(1); % For reproducibility

Create an SVM template, and standardize the predictors. t = templateSVM('Standardize',true) t = Fit template for classification SVM. Alpha: BoxConstraint: CacheSize: CachingMethod: ClipAlphas: DeltaGradientTolerance: Epsilon: GapTolerance: KKTTolerance: IterationLimit: KernelFunction: KernelScale:

[0x1 double] [] [] '' [] [] [] [] [] [] '' []

32-933

32

Functions

KernelOffset: KernelPolynomialOrder: NumPrint: Nu: OutlierFraction: RemoveDuplicates: ShrinkagePeriod: Solver: StandardizeData: SaveSupportVectors: VerbosityLevel: Version: Method: Type:

[] [] [] [] [] [] [] '' 1 [] [] 2 'SVM' 'classification'

t is an SVM template. Most of the template object properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values. Train the ECOC classifier, and specify the class order. Mdl = fitcecoc(X,Y,'Learners',t,... 'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationECOC classifier. You can access its properties using dot notation. Cross-validate Mdl using 10-fold cross-validation. CVMdl = crossval(Mdl);

CVMdl is a ClassificationPartitionedECOC cross-validated ECOC classifier. Estimate the generalized classification error. genError = kfoldLoss(CVMdl) genError = 0.0400

The generalized classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.

Cross-Validate ECOC Classifier Using Parallel Computing Consider the arrhythmia data set. This data set contains 16 classes, 13 of which are represented in the data. The first class indicates that the subject does not have arrhythmia, and the last class indicates that the arrhythmia state of the subject is not recorded. The other classes are ordinal levels indicating the severity of arrhythmia. Train an ECOC classifier with a custom coding design specified by the description of the classes. Load the arrhythmia data set. Convert Y to a categorical variable, and determine the number of classes. 32-934

crossval

load arrhythmia Y = categorical(Y); K = unique(Y); % Number of distinct classes

Construct a coding matrix that describes the nature of the classes. OrdMat = designecoc(11,'ordinal'); nOrdMat = size(OrdMat); class1VSOrd = [1; -ones(11,1); 0]; class1VSClass16 = [1; zeros(11,1); -1]; OrdVSClass16 = [0; ones(11,1); -1]; Coding = [class1VSOrd class1VSClass16 OrdVSClass16,... [zeros(1,nOrdMat(2)); OrdMat; zeros(1,nOrdMat(2))]];

Train an ECOC classifier using the custom coding design (Coding) and parallel computing. Specify an ensemble of 50 classification trees boosted using GentleBoost. t = templateEnsemble('GentleBoost',50,'Tree'); options = statset('UseParallel',true); Mdl = fitcecoc(X,Y,'Coding',Coding,'Learners',t,'Options',options); Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).

Mdl is a ClassificationECOC model. You can access its properties using dot notation. Cross-validate Mdl using 8-fold cross-validation and parallel computing. rng(1); % For reproducibility CVMdl = crossval(Mdl,'Options',options,'KFold',8); Warning: One or more folds do not contain points from all the groups.

Because some classes have low relative frequency, some folds do not train using observations from those classes. CVMdl is a ClassificationPartitionedECOC cross-validated ECOC model. Estimate the generalization error using parallel computing. error = kfoldLoss(CVMdl,'Options',options) error = 0.3208

The cross-validated classification error is 32%, which indicates that this model does not generalize well. To improve the model, try training using a different boosting method, such as RobustBoost, or a different algorithm, such as SVM.

Input Arguments Mdl — Full, trained multiclass ECOC model ClassificationECOC model Full, trained multiclass ECOC model, specified as a ClassificationECOC model trained with fitcecoc.

32-935

32

Functions

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: crossval(Mdl,'KFold',3) specifies using three folds in a cross-validated model. CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets. To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',5 32-936

crossval

Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Leaveout','on' Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true).

Tips • Assess the predictive performance of Mdl on cross-validated data using the "kfold" methods and properties of CVMdl, such as kfoldLoss.

Alternative Functionality Instead of training an ECOC model and then cross-validating it, you can create a cross-validated ECOC model directly by using fitcecoc and specifying one of these name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'Leaveout', or 'KFold'.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) 32-937

32

Functions

For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationECOC | ClassificationPartitionedECOC | CompactClassificationECOC | cvpartition | fitcecoc | statset Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2014b

32-938

crossval

crossval Cross validate ensemble

Syntax cvens = crossval(ens) cvens = crossval(ens,Name,Value)

Description cvens = crossval(ens) creates a cross-validated ensemble from ens, a classification ensemble. Default is 10-fold cross validation. cvens = crossval(ens,Name,Value) creates a cross-validated ensemble with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments ens A classification ensemble created with fitcensemble. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. cvpartition A partition of class cvpartition. Sets the partition for cross validation. Use no more than one of the name-value pairs cvpartition, holdout, kfold, or leaveout. holdout Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. Specify a numeric scalar from 0 to 1. You can only use one of these four options at a time for creating a cross-validated tree: 'kfold', 'holdout', 'leaveout', or 'cvpartition'. kfold Number of folds for cross validation, a numeric positive scalar greater than 1. Use no more than one of the name-value pairs 'kfold', 'holdout', 'leaveout', or 'cvpartition'. leaveout If 'on', use leave-one-out cross validation. 32-939

32

Functions

Use no more than one of the name-value pairs 'kfold', 'holdout', 'leaveout', or 'cvpartition'. nprint Printout frequency, a positive integer scalar. Use this parameter to observe the training of crossvalidation folds. Default: 'off', meaning no printout

Output Arguments cvens A cross-validated classification ensemble of class ClassificationPartitionedEnsemble.

Examples Cross-Validate Classification Ensemble Create a cross-validated classification model for the Fisher iris data, and assess its quality using the kfoldLoss method. Load the Fisher iris data set. load fisheriris

Train an ensemble of 100 boosted classification trees using AdaBoostM2. t = templateTree('MaxNumSplits',1); % Weak learner template tree object ens = fitcensemble(meas,species,'Method','AdaBoostM2','Learners',t);

Create a cross-validated ensemble from ens and find the classification error averaged over all folds. rng(10,'twister') % For reproducibility cvens = crossval(ens); L = kfoldLoss(cvens) L = 0.0467

Alternatives You can create a cross-validation ensemble directly from the data, instead of creating an ensemble followed by a cross-validation ensemble. To do so, include one of these five options in fitcensemble: 'crossval', 'kfold', 'holdout', 'leaveout', or 'cvpartition'.

See Also ClassificationPartitionedEnsemble | cvpartition

32-940

crossval

crossval Cross-validated k-nearest neighbor classifier

Syntax cvmodel = crossval(mdl) cvmodel = crossval(mdl,Name,Value)

Description cvmodel = crossval(mdl) creates a cross-validated (partitioned) model from mdl, a fitted KNN classification model. By default, crossval uses 10-fold cross-validation on the training data to create cvmodel, a ClassificationPartitionedModel object. cvmodel = crossval(mdl,Name,Value) creates a partitioned model with additional options specified by one or more name-value pair arguments. For example, specify 'Leaveout','on' for leave-one-out cross-validation.

Examples Cross-Validated k-Nearest Neighbor Model Create a cross-validated k-nearest neighbor model, and assess classification performance using the model. Load the Fisher iris data set. load fisheriris X = meas; Y = species;

Create a classifier for nearest neighbors. mdl = fitcknn(X,Y);

Create a cross-validated classifier. cvmdl = crossval(mdl) cvmdl = classreg.learning.partition.ClassificationPartitionedModel CrossValidatedModel: 'KNN' PredictorNames: {'x1' 'x2' 'x3' 'x4'} ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none'

32-941

32

Functions

Properties, Methods

Find the cross-validated loss of the classifier. cvmdlloss = kfoldLoss(cvmdl) cvmdlloss = 0.0467

The cross-validated loss is less than 5%. You can expect mdl to have a similar error rate.

Input Arguments mdl — k-nearest neighbor classifier model ClassificationKNN object k-nearest neighbor classifier model, specified as a ClassificationKNN object. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: crossval(mdl,'KFold',5) creates a partitioned model with 5-fold cross-validation. CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition object created by the cvpartition function. crossval splits the data into subsets with cvpartition. Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'Holdout',0.3 Data Types: single | double KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. 32-942

crossval

Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'KFold',3 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. Leave-one-out is a special case of 'KFold' in which the number of folds equals the number of observations. Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'Leaveout','on'

Tips • Assess the predictive performance of mdl on cross-validated data by using the “kfold” methods and properties of cvmodel, such as kfoldLoss.

Alternative Functionality You can create a cross-validated model directly from the data instead of creating a model followed by a cross-validated model. To do so, specify one of these options in fitcknn: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

See Also ClassificationKNN | ClassificationPartitionedModel | crossval | fitcknn | kfoldEdge | kfoldLoss | kfoldMargin | kfoldPredict | kfoldfun Topics “Examine Quality of KNN Classifier” on page 18-27 “Modify KNN Classifier” on page 18-28 “Classification Using Nearest Neighbors” on page 18-11 Introduced in R2012a

32-943

32

Functions

crossval Class: ClassificationNaiveBayes Cross-validated naive Bayes classifier

Syntax CVMdl = crossval(Mdl) CVMdl = crossval(Mdl,Name,Value)

Description CVMdl = crossval(Mdl) returns a partitioned naive Bayes classifier (CVSMdl) from a trained naive Bayes classifier (Mdl). By default, crossval uses 10-fold cross validation on the training data to create CVMdl. CVMdl = crossval(Mdl,Name,Value) returns a partitioned naive Bayes classifier with additional options specified by one or more Name,Value pair arguments. For example, you can specify a holdout sample proportion.

Input Arguments Mdl — Fully trained naive Bayes classifier ClassificationNaiveBayes model A fully trained naive Bayes classifier, specified as a ClassificationNaiveBayes model trained by fitcnb. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets. To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp. 32-944

crossval

Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Leaveout','on' 32-945

32

Functions

Output Arguments CVMdl — Cross-validated naive Bayes classifier ClassificationPartitionedModel model Cross-validated naive Bayes classifier, returned as a ClassificationPartitionedModel model.

Examples Cross Validate a Naive Bayes Classifier Using crossval Load the ionosphere data set. load ionosphere X = X(:,3:end); % Remove first two predictors for stability rng(1); % For reproducibility

Train a naive Bayes classifier. It is good practice to define the class order. Assume that each predictor is conditionally, normally distributed given its label. Mdl = fitcnb(X,Y,'ClassNames',{'b','g'});

Mdl is a trained ClassificationNaiveBayes classifier. 'b' is the negative class and 'g' is the positive class. Cross validate the classifier using 10-fold cross validation. CVMdl = crossval(Mdl) CVMdl = classreg.learning.partition.ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {1x32 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods FirstModel = CVMdl.Trained{1} FirstModel = classreg.learning.classif.CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell}

32-946

crossval

Properties, Methods

CVMdl is a ClassificationPartitionedModel cross-validated classifier. The software: 1

Randomly partitions the data into 10, equally sized sets.

2

Trains a naive Bayes classifier on nine of the sets.

3

Repeats steps 1 and 2 k = 10 times. It excludes one partition each time, and trains on the other nine partitions.

4

Combines generalization statistics for each fold.

FirstModel is the first of the 10 trained classifiers. It is a CompactClassificationNaiveBayes model. You can estimate the generalization error by passing CVMdl to kfoldLoss.

Specify a Holdout-Sample Proportion for Naive Bayes Cross Validation By default, crossval uses 10-fold cross validation to cross validate a naive Bayes classifier. You have several other options, such as specifying a different number of folds or holdout-sample proportion. This example shows how to specify a holdout-sample proportion. Load the ionosphere data set. load ionosphere X = X(:,3:end); % Remove first two predictors for stability rng(1); % For reproducibility

Train a naive Bayes classifier. Assume that each predictor is conditionally, normally distributed given its label. It is good practice to define the class order. Mdl = fitcnb(X,Y,'ClassNames',{'b','g'});

Mdl is a trained ClassificationNaiveBayes classifier. 'b' is the negative class and 'g' is the positive class. Cross validate the classifier by specifying a 30% holdout sample. CVMdl = crossval(Mdl,'Holdout',0.30) CVMdl = classreg.learning.partition.ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {1x32 cell} ResponseName: 'Y' NumObservations: 351 KFold: 1 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

32-947

32

Functions

TrainedModel = CVMdl.Trained{1} TrainedModel = classreg.learning.classif.CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell} Properties, Methods

CVMdl is a ClassificationPartitionedModel. TrainedModel is a CompactClassificationNaiveBayes classifier trained using 70% of the data. Estimate the generalization error. kfoldLoss(CVMdl) ans = 0.2571

The out-of-sample misclassification error is approximately 2.6%.

Tips Assess the predictive performance of Mdl on cross-validated data using the “kfold” function and properties of CVMdl, such as kfoldLoss.

Alternatives Instead of creating a naive Bayes classifier followed by a cross-validation classifier, create a crossvalidated classifier directly using fitcnb and by specifying any of these name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'Leaveout', or 'KFold'.

See Also ClassificationNaiveBayes | ClassificationPartitionedModel | CompactClassificationNaiveBayes | fitcnb | kfoldLoss

32-948

crossval

crossval Cross-validate support vector machine (SVM) classifier

Syntax CVSVMModel = crossval(SVMModel) CVSVMModel = crossval(SVMModel,Name,Value)

Description CVSVMModel = crossval(SVMModel) returns a cross-validated (partitioned) support vector machine (SVM) classifier (CVSVMModel) from a trained SVM classifier (SVMModel). By default, crossval uses 10-fold cross-validation on the training data to create CVSVMModel, a ClassificationPartitionedModel classifier. CVSVMModel = crossval(SVMModel,Name,Value) returns a partitioned SVM classifier with additional options specified by one or more name-value pair arguments. For example, you can specify the number of folds or holdout sample proportion.

Examples Cross-Validate SVM Classifier Load the ionosphere data set. load ionosphere rng(1); % For reproducibility

Train an SVM classifier. Standardize the predictor data and specify the order of the classes. SVMModel = fitcsvm(X,Y,'Standardize',true,'ClassNames',{'b','g'});

SVMModel is a trained ClassificationSVM classifier. 'b' is the negative class and 'g' is the positive class. Cross-validate the classifier using 10-fold cross-validation. CVSVMModel = crossval(SVMModel) CVSVMModel = classreg.learning.partition.ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {1x34 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

32-949

32

Functions

Properties, Methods FirstModel = CVSVMModel.Trained{1} FirstModel = classreg.learning.classif.CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78x1 double] Bias: -0.2209 KernelParameters: [1x1 struct] Mu: [1x34 double] Sigma: [1x34 double] SupportVectors: [78x34 double] SupportVectorLabels: [78x1 double] Properties, Methods

CVSVMModel is a ClassificationPartitionedModel cross-validated classifier. During crossvalidation, the software completes these steps: 1

Randomly partition the data into 10 sets of equal size.

2

Train an SVM classifier on nine of the sets.

3

Repeat steps 1 and 2 k = 10 times. The software leaves out one partition each time and trains on the other nine partitions.

4

Combine generalization statistics for each fold.

FirstModel is the first of the 10 trained classifiers. It is a CompactClassificationSVM classifier. You can estimate the generalization error by passing CVSVMModel to kfoldLoss.

Specify Holdout Sample Proportion for SVM Cross-Validation Specify a holdout sample proportion for cross-validation. By default, crossval uses 10-fold crossvalidation to cross-validate an SVM classifier. However, you have several other options for crossvalidation. For example, you can specify a different number of folds or holdout sample proportion. Load the ionosphere data set. load ionosphere rng(1); % For reproducibility

Train an SVM classifier. Standardize the data and specify that 'g' is the positive class. SVMModel = fitcsvm(X,Y,'Standardize',true,'ClassNames',{'b','g'});

SVMModel is a trained ClassificationSVM classifier. Cross-validate the classifier by specifying a 15% holdout sample. 32-950

crossval

CVSVMModel = crossval(SVMModel,'Holdout',0.15) CVSVMModel = classreg.learning.partition.ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {1x34 cell} ResponseName: 'Y' NumObservations: 351 KFold: 1 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

CVSVMModel is a ClassificationPartitionedModel. Display properties of the classifier trained using 85% of the data. TrainedModel = CVSVMModel.Trained{1} TrainedModel = classreg.learning.classif.CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [74x1 double] Bias: -0.2952 KernelParameters: [1x1 struct] Mu: [1x34 double] Sigma: [1x34 double] SupportVectors: [74x34 double] SupportVectorLabels: [74x1 double] Properties, Methods

TrainedModel is a CompactClassificationSVM classifier trained using 85% of the data. Estimate the generalization error. kfoldLoss(CVSVMModel) ans = 0.0769

The out-of-sample misclassification error is approximately 8%.

Input Arguments SVMModel — Full, trained SVM classifier ClassificationSVM classifier Full, trained SVM classifier, specified as a ClassificationSVM model trained with fitcsvm. 32-951

32

Functions

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: crossval(SVMModel,'KFold',5) specifies using five folds in a cross-validated model. CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets. To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',5 32-952

crossval

Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Leaveout','on'

Tips Assess the predictive performance of SVMModel on cross-validated data by using the “kfold” methods and properties of CVSVMModel, such as kfoldLoss.

Alternative Functionality Instead of training an SVM classifier and then cross-validating it, you can create a cross-validated classifier directly by using fitcsvm and specifying any of these name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'Leaveout', or 'KFold'.

See Also ClassificationPartitionedModel | ClassificationSVM | CompactClassificationSVM | cvpartition | fitcsvm Introduced in R2014a

32-953

32

Functions

crossval Class: ClassificationTree Cross-validated decision tree

Syntax cvmodel = crossval(model) cvmodel = crossval(model,Name,Value)

Description cvmodel = crossval(model) creates a partitioned model from model, a fitted classification tree. By default, crossval uses 10-fold cross validation on the training data to create cvmodel. cvmodel = crossval(model,Name,Value) creates a partitioned model with additional options specified by one or more Name,Value pair arguments.

Input Arguments model A classification model, produced using fitctree. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. CVPartition — Cross-validation partition [] (default) | cvpartition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition object created by the cvpartition function. crossval splits the data into subsets with cvpartition. Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'Holdout',0.3 32-954

crossval

Data Types: single | double KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'KFold',3 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. Leave-one-out is a special case of 'KFold' in which the number of folds equals the number of observations. Use only one of these four options at a time: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'Leaveout','on'

Output Arguments cvmodel — Partitioned model ClassificationPartionedModel object Partitioned model, returned as a ClassificationPartitionedModel object.

Examples Create a Cross-Validation Model Create a classification model for the ionosphere data, then create a cross-validation model. Evaluate the quality the model using kfoldLoss. load ionosphere tree = fitctree(X,Y); cvmodel = crossval(tree); L = kfoldLoss(cvmodel) L = 0.1111

Tips • Assess the predictive performance of model on cross-validated data using the “kfold” methods and properties of cvmodel, such as kfoldLoss. 32-955

32

Functions

Alternatives You can create a cross-validation tree directly from the data, instead of creating a decision tree followed by a cross-validation tree. To do so, include one of these five options in fitctree: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

See Also crossval | fitctree

32-956

crossval

crossval Cross validate ensemble

Syntax cvens = crossval(ens) cvens = crossval(ens,Name,Value)

Description cvens = crossval(ens) creates a cross-validated ensemble from ens, a regression ensemble. Default is 10-fold cross validation. cvens = crossval(ens,Name,Value) creates a cross-validated ensemble with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments ens A regression ensemble created with fitrensemble. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. cvpartition A partition of class cvpartition. Sets the partition for cross validation. Use no more than one of the name-value pairs cvpartition, holdout, kfold, and leaveout. holdout Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. Specify a numeric scalar from 0 to 1. You can only use one of these four options at a time for creating a cross-validated tree: 'kfold', 'holdout', 'leaveout', or 'cvpartition'. kfold Number of folds for cross validation, a positive integer value greater than 1. Use no more than one of the name-value pairs 'kfold', 'holdout', 'leaveout', or 'cvpartition'. leaveout If 'on', use leave-one-out cross-validation. 32-957

32

Functions

Use no more than one of the name-value pairs 'kfold', 'holdout', 'leaveout', or 'cvpartition'. nprint Printout frequency, a positive integer scalar. Use this parameter to observe the training of crossvalidation folds. Default: 'off', meaning no printout

Output Arguments cvens A cross-validated classification ensemble of class RegressionPartitionedEnsemble.

Examples Create Cross-Validated Regression Model Create a cross-validated regression model for the carsmall data, and evaluate its quality using the kfoldLoss method. Load the carsmall data set and select acceleration, displacement, horsepower, and vehicle weight as predictors. load carsmall; X = [Acceleration Displacement Horsepower Weight];

Train a regression ensemble. rens = fitrensemble(X,MPG);

Create a cross-validated ensemble from rens and find the cross-validation loss. rng(10,'twister') % For reproducibility cvens = crossval(rens); L = kfoldLoss(cvens) L = 30.8569

Alternatives You can create a cross-validation ensemble directly from the data, instead of creating an ensemble followed by a cross-validation ensemble. To do so, include one of these five options in fitrensemble: 'crossval', 'kfold', 'holdout', 'leaveout', or 'cvpartition'.

See Also RegressionPartitionedEnsemble | cvpartition | kfoldLoss

32-958

crossval

crossval Class: RegressionGP Cross-validate Gaussian process regression model

Syntax cvMdl = crossval(gprMdl) cvmdl = crossval(gprMdl,Name,Value)

Description cvMdl = crossval(gprMdl) returns the partitioned model, cvMdl, built from the Gaussian process regression (GPR) model, gprMdl, using 10-fold cross validation. cvmdl is a RegressionPartitionedModel object, and gprMdl is a RegressionGP (full) object. cvmdl = crossval(gprMdl,Name,Value) returns the partitioned model, cvmdl, with additional options specified by one or more Name,Value pair arguments. For example, you can specify the number of folds or the fraction of the data to use for testing.

Input Arguments gprMdl — Gaussian process regression model RegressionGP object Gaussian process regression model, specified as a RegressionGP (full) object. You cannot call crossval on a compact regression object. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. CVPartition — Random partition for a k-fold cross validation cvpartition object Random partition for a k-fold cross validation, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition object. Example: 'CVPartition',cvp uses the random partition defined by cvp. If you specify CVPartition, then you cannot specify Holdout, KFold, or LeaveOut. Holdout — Fraction of data to use for testing scalar value in the range from 0 to 1 Fraction of the data to use for testing in holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range from 0 to 1. If you specify 'Holdout',p, then crossval: 32-959

32

Functions

1. Randomly reserves p*100% of the data as validation data, and trains the model using the rest of the data 2. Stores the compact, trained model in cvgprMdl.Trained. Example: 'Holdout', 0.3 uses 30% of the data for testing and 70% of the data for training. If you specify Holdout, then you cannot specify CVPartition, KFold, or Leaveout. Data Types: single | double KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in cross-validated GPR model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. Kfold must be greater than 1. If you specify 'Kfold',k then crossval: 1. Randomly partitions the data into k sets. 2. For each set, reserves the set as test data, and trains the model using the other k – 1 sets. 3. Stores the k compact, trained models in the cells of a k-by-1 cell array in cvgprMdl.Trained. Example: 'KFold',5 uses 5 folds in cross-validation. That is, for each fold, it uses that fold as test data, and trains the model on the remaining 4 folds. If you specify KFold, then you cannot specify CVPartition, Holdout, or Leaveout. Data Types: single | double Leaveout — Indicator for leave-one-out cross-validation 'off' (default) | 'on' Indicator for leave-one-out cross-validation, specified as the comma-separated pair consisting of 'LeaveOut' and either 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations, crossval: 1. Reserves the observation as test data, and trains the model using the other n – 1 observations. 2. Stores the n compact, trained models in the cells of a n-by-1 cell array in cvgprMdl.Trained. Example: 'Leaveout','on' If you specify Leaveout, then you cannot specify CVPartition, Holdout, or KFold.

Output Arguments cvgprMdl — Partitioned Gaussian process regression model RegressionPartitionedModel object Partitioned Gaussian process regression model, returned as a RegressionPartitionedModel object.

Examples Partition Data for Cross-Validation Download the housing data [1], from the UCI Machine Learning Repository [4]. 32-960

crossval

The dataset has 506 observations. The first 13 columns contain the predictor values and the last column contains the response values. The goal is to predict the median value of owner-occupied homes in suburban Boston as a function of 13 predictors. Load the data and define the response vector and the predictor matrix. load('housing.data'); X = housing(:,1:13); y = housing(:,end);

Fit a GPR model using the squared exponential kernel function with separate length scale for each predictor. Standardize the predictor variables. gprMdl = fitrgp(X,y,'KernelFunction','ardsquaredexponential','Standardize',1);

Create a cross-validation partition for data using predictor 4 as a grouping variable. rng('default') % For reproducibility cvp = cvpartition(X(:,4),'kfold',10);

Create a 10-fold cross-validated model using the partitioned data in cvp. cvgprMdl = crossval(gprMdl,'CVPartition',cvp);

Compute the regression loss for in-fold observations using models trained on out-of-fold observations. L = kfoldLoss(cvgprMdl) L = 9.5299

Predict the response for in-fold observations, i.e. observations not used for training. ypred = kfoldPredict(cvgprMdl);

For every fold, kfoldPredict predicts responses for observations in that fold using the models trained on out-of-fold observations. Plot the actual responses and prediction data. plot(y,'r.'); hold on; plot(ypred,'b--.'); axis([0 510 -15 65]); legend('True response','GPR prediction','Location','Best'); hold off;

32-961

32

Functions

Train GPR Model Using 4-Fold Cross Validation Download the abalone data [2], [3], from the UCI Machine Learning Repository [4] and save it in your current directory with the name abalone.data. Read the data into a table. tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false);

The dataset has 4177 observations. The goal is to predict the age of abalone from 8 physical measurements. Fit a GPR model using the subset of regressors (sr) method for parameter estimation and fully independent conditional (fic) method for prediction. Standardize the predictors and use a squared exponential kernel function with a separate length scale for each predictor. gprMdl = fitrgp(tbl,tbl(:,end),'KernelFunction','ardsquaredexponential',... 'FitMethod','sr','PredictMethod','fic','Standardize',1);

Cross-validate the model using 4-fold cross validation. This partitions the data into 4 sets. For each set, fitrgp uses that set (25% of the data) as the test data, and trains the model on the remaining 3 sets (75% of the data). rng('default') % For reproducibility cvgprMdl = crossval(gprMdl,'KFold',4);

Compute the loss over individual folds. L = kfoldLoss(cvgprMdl,'mode','individual') L = 4.3669 4.6896

32-962

crossval

4.0565 4.3162

Compute the average cross-validated loss on over all folds. The default is the mean squared error. L2 = kfoldLoss(cvgprMdl) L2 = 4.3573

This is equal to the mean loss over individual folds. mse = mean(L) mse = 4.3573

Tips • You can only use one of the name-value pair arguments at a time. • You cannot compute the prediction intervals for a cross-validated model.

Alternatives Alternatively, you can train a cross-validated model using the related name-value pair arguments in fitrgp. If you supply a custom 'ActiveSet' in the call to fitrgp, then you cannot cross validate the GPR model.

References [1] Harrison, D. and D.L., Rubinfeld. "Hedonic prices and the demand for clean air." J. Environ. Economics & Management. Vol.5, 1978, pp. 81-102. [2] Warwick J. N., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. "The Population Biology of Abalone (_Haliotis_ species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait." Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288), 1994. [3] S. Waugh. "Extending and Benchmarking Cascade-Correlation", PhD Thesis. Computer Science Department, University of Tasmania, 1995. [4] Lichman, M. UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2013. http://archive.ics.uci.edu/ml.

See Also RegressionGP | RegressionPartitionedModel | fitrgp | kfoldLoss | kfoldPredict Topics “Train GPR Model Using Cross-Validation” on page 32-1917 32-963

32

Functions

Introduced in R2015b

32-964

crossval

crossval Class: RegressionSVM Cross-validated support vector machine regression model

Syntax CVMdl = crossval(mdl) CVMdl = crossval(mdl,Name,Value)

Description CVMdl = crossval(mdl) returns a cross-validated (partitioned) support vector machine regression model, CVMdl, from a trained SVM regression model, mdl. CVMdl = crossval(mdl,Name,Value) returns a cross-validated model with additional options specified by one or more Name,Value pair arguments.

Input Arguments mdl — Full, trained SVM regression model RegressionSVM model Full, trained SVM regression model, specified as a RegressionSVM model returned by fitrsvm. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets. To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps: 32-965

32

Functions

1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Leaveout','on'

Output Arguments CVMdl — Cross-validated SVM regression model RegressionPartitionedSVM model Cross-validated SVM regression model, returned as a RegressionPartitionedSVM model. 32-966

crossval

Examples Train Cross-Validated SVM Regression Model Using crossval This example shows how to train a cross-validated SVM regression model using crossval. This example uses the abalone data from the UCI Machine Learning Repository. Download the data and save it in your current folder with the name 'abalone.data'. Read the data into a table. tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false); rng default % for reproducibility

The sample data contains 4177 observations. All the predictor variables are continuous except for sex, which is a categorical variable with possible values 'M' (for males), 'F' (for females), and 'I' (for infants). The goal is to predict the number of rings on the abalone and determine its age using physical measurements. Train an SVM regression model, using a Gaussian kernel function with a kernel scale equal to 2.2. Standardize the data. mdl = fitrsvm(tbl,'Var9','KernelFunction','gaussian','KernelScale',2.2,'Standardize',true);

mdl is a trained RegressionSVM regression model. Cross validate the model using 10-fold cross validation. CVMdl = crossval(mdl) CVMdl = classreg.learning.partition.RegressionPartitionedSVM CrossValidatedModel: 'SVM' PredictorNames: {1x8 cell} CategoricalPredictors: 1 ResponseName: 'Var9' NumObservations: 4177 KFold: 10 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods

CVMdl is a RegressionPartitionedSVM cross-validated regression model. The software: 1. Randomly partitions the data into 10 equally sized sets. 2. Trains an SVM regression model on nine of the 10 sets. 3. Repeats steps 1 and 2 k = 10 times. It leaves out one of the partitions each time, and trains on the other nine partitions. 4. Combines generalization statistics for each fold. Calculate the resubstitution loss for the cross-validated model. loss = kfoldLoss(CVMdl)

32-967

32

Functions

loss = 4.5712

Specify Cross-Validation Holdout Proportion for SVM Regression This example shows how to specify a holdout proportion for training a cross-validated SVM regression model. This example uses the abalone data from the UCI Machine Learning Repository. Download the data and save it in your current folder with the name 'abalone.data'. Read the data into a table. tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false); rng default % for reproducibility

The sample data contains 4177 observations. All the predictor variables are continuous except for sex, which is a categorical variable with possible values 'M' (for males), 'F' (for females), and 'I' (for infants). The goal is to predict the number of rings on the abalone and determine its age using physical measurements. Train an SVM regression model, using a Gaussian kernel function with an automatic kernel scale. Standardize the data. mdl = fitrsvm(tbl,'Var9','KernelFunction','gaussian','KernelScale','auto','Standardize',true);

mdl is a trained RegressionSVM regression model. Cross validate the regression model by specifying a 10% holdout sample. CVMdl = crossval(mdl,'Holdout',0.1) CVMdl = classreg.learning.partition.RegressionPartitionedSVM CrossValidatedModel: 'SVM' PredictorNames: {1x8 cell} CategoricalPredictors: 1 ResponseName: 'Var9' NumObservations: 4177 KFold: 1 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods

CVMdl is a RegressionPartitionedSVM model object. Calculate the resubstitution loss for the cross-validated model. loss = kfoldLoss(CVMdl)

32-968

crossval

loss = 5.2499

Alternatives Instead of training an SVM regression model and then cross-validating it, you can create a crossvalidated model directly by using fitrsvm and specifying any of these name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'Leaveout', or 'KFold'.

References [1] Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait, Sea Fisheries Division, Technical Report No. 48, 1994. [2] Waugh, S. Extending and benchmarking Cascade-Correlation, Ph.D. thesis, Computer Science Department, University of Tasmania, 1995. [3] Clark, D., Z. Schreter, A. Adams. A Quantitative Comparison of Dystal and Backpropagation, submitted to the Australian Conference on Neural Networks, 1996. [4] Lichman, M. UCI Machine Learning Repository, [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

See Also CompactRegressionSVM | RegressionPartitionedSVM | RegressionSVM | fitrsvm | kfoldLoss | kfoldPredict Introduced in R2015b

32-969

32

Functions

crossval Class: RegressionTree Cross-validated decision tree

Syntax cvmodel = crossval(model) cvmodel = crossval(model,Name,Value)

Description cvmodel = crossval(model) creates a partitioned model from model, a fitted regression tree. By default, crossval uses 10-fold cross validation on the training data to create cvmodel. cvmodel = crossval(model,Name,Value) creates a partitioned model with additional options specified by one or more Name,Value pair arguments.

Input Arguments model A regression model, produced using fitrtree. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. CVPartition Object of class cvpartition, created by the cvpartition function. crossval splits the data into subsets with cvpartition. Use only one of these four options at a time: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Default: [] Holdout Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. Specify a numeric scalar from 0 to 1. You can only use one of these four options at a time for creating a cross-validated tree: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. KFold Number of folds to use in a cross-validated tree, a positive integer value greater than 1. 32-970

crossval

Use only one of these four options at a time: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Default: 10 Leaveout Set to 'on' for leave-one-out cross-validation.

Output Arguments cvmodel A partitioned model of class RegressionPartitionedModel.

Examples Cross-Validate Regression Tree Load the carsmall data set. Consider Acceleration, Displacement, Horsepower, and Weight as predictor variables. load carsmall X = [Acceleration Displacement Horsepower Weight];

Grow a regression tree using the entire data set. Mdl = fitrtree(X,MPG);

Mdl is a RegressionTree model. Cross-validate the regression tree using 10-fold cross-validation. CVMdl = crossval(Mdl);

CVMdl is a RegressionPartitionedModel cross-validated model. crossval stores the ten trained, compact regression trees in the Trained property of CVMdl. Display the compact regression tree that crossval trained using all observations except those in the first fold. CVMdl.Trained{1} ans = classreg.learning.regr.CompactRegressionTree PredictorNames: {'x1' 'x2' 'x3' 'x4'} ResponseName: 'Y' CategoricalPredictors: [] ResponseTransform: 'none' Properties, Methods

32-971

32

Functions

Estimate the generalization error of Mdl by computing the 10-fold cross-validated mean-squared error. L = kfoldLoss(CVMdl) L = 25.6450

Tips • Assess the predictive performance of model on cross-validated data using the “kfold” methods and properties of cvmodel, such as kfoldLoss.

Alternatives You can create a cross-validation tree directly from the data, instead of creating a decision tree followed by a cross-validation tree. To do so, include one of these five options in fitrtree: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'.

See Also crossval | fitrtree

32-972

cvloss

cvloss Class: ClassificationTree Classification error by cross validation

Syntax E = cvloss(tree) [E,SE] = cvloss(tree) [E,SE,Nleaf] = cvloss(tree) [E,SE,Nleaf,BestLevel] = cvloss(tree) [ ___ ] = cvloss(tree,Name,Value)

Description E = cvloss(tree) returns the cross-validated classification error (loss) for tree, a classification tree. The cvloss method uses stratified partitioning to create cross-validated sets. That is, for each fold, each partition of the data has roughly the same class proportions as in the data used to train tree. [E,SE] = cvloss(tree) returns the standard error of E. [E,SE,Nleaf] = cvloss(tree) returns the number of leaves of tree. [E,SE,Nleaf,BestLevel] = cvloss(tree) returns the optimal pruning level for tree. [ ___ ] = cvloss(tree,Name,Value) cross validates with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes. You can specify several namevalue pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments tree — Trained classification tree ClassificationTree model object Trained classification tree, specified as a ClassificationTree model object produced by fitctree. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Subtrees — Pruning level 0 (default) | vector of nonnegative integers | 'all' Pruning level, specified as the comma-separated pair consisting of 'Subtrees' and a vector of nonnegative integers in ascending order or 'all'. 32-973

32

Functions

If you specify a vector, then all elements must be at least 0 and at most max(tree.PruneList). 0 indicates the full, unpruned tree and max(tree.PruneList) indicates the completely pruned tree (i.e., just the root node). If you specify 'all', then cvloss operates on all subtrees (i.e., the entire pruning sequence). This specification is equivalent to using 0:max(tree.PruneList). cvloss prunes tree to each level indicated in Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments. To invoke Subtrees, the properties PruneList and PruneAlpha of tree must be nonempty. In other words, grow tree by setting 'Prune','on', or by pruning tree using prune. Example: 'Subtrees','all' Data Types: single | double | char | string TreeSize — Tree size 'se' (default) | 'min' Tree size, specified as the comma-separated pair consisting of 'TreeSize' and one of the following values: • 'se' — cvloss uses the smallest tree whose cost is within one standard error of the minimum cost. • 'min' — cvloss uses the minimal cost tree. Example: 'TreeSize','min' KFold — Number of cross-validation samples 10 (default) | positive integer value greater than 1 Number of cross-validation samples, specified as the comma-separated pair consisting of KFold and a positive integer value greater than 1. Example: 'KFold',8

Output Arguments E — Cross-validation classification error numeric vector | scalar value Cross-validation classification error (loss), returned as a vector or scalar depending on the setting of the Subtrees name-value pair. SE — Standard error numeric vector | scalar value Standard error of E, returned as a vector or scalar depending on the setting of the Subtrees namevalue pair. Nleaf — Number of leaf nodes numeric vector | scalar value Number of leaf nodes in tree, returned as a vector or scalar depending on the setting of the Subtrees name-value pair. Leaf nodes are terminal nodes, which give classifications, not splits. 32-974

cvloss

BestLevel — Best pruning level scalar value Best pruning level, returned as a scalar value. By default, a scalar representing the largest pruning level that achieves a value of E within SE of the minimum error. If you set TreeSize to 'min', BestLevel is the smallest value in Subtrees.

Examples Compute the Cross-Validation Error Compute the cross-validation error for a default classification tree. Load the ionosphere data set. load ionosphere

Grow a classification tree using the entire data set. Mdl = fitctree(X,Y);

Compute the cross-validation error. rng(1); % For reproducibility E = cvloss(Mdl) E = 0.1111

E is the 10-fold misclassification error.

Find the Best Pruning Level Using Cross Validation Apply k-fold cross validation to find the best level to prune a classification tree for all of its subtrees. Load the ionosphere data set. load ionosphere

Grow a classification tree using the entire data set. View the resulting tree. Mdl = fitctree(X,Y); view(Mdl,'Mode','graph')

32-975

32

Functions

Compute the 5-fold cross-validation error for each subtree except for the highest pruning level. Specify to return the best pruning level over all subtrees. rng(1); % For reproducibility m = max(Mdl.PruneList) - 1 m = 7 [E,~,~,bestLevel] = cvloss(Mdl,'SubTrees',0:m,'KFold',5) E = 8×1 0.1368 0.1339 0.1311 0.1339 0.1339 0.1254 0.0997 0.1738 bestLevel = 6

32-976

cvloss

Of the 7 pruning levels, the best pruning level is 6. Prune the tree to the best level. View the resulting tree. MdlPrune = prune(Mdl,'Level',bestLevel); view(MdlPrune,'Mode','graph')

Alternatives You can construct a cross-validated tree model with crossval, and call kfoldLoss instead of cvloss. If you are going to examine the cross-validated tree more than once, then the alternative can save time. However, unlike cvloss, kfoldLoss does not return SE,Nleaf, or BestLevel. kfoldLoss also does not allow you to examine any error other than the classification error.

See Also crossval | fitctree | kfoldLoss | loss 32-977

32

Functions

cvloss Class: RegressionTree Regression error by cross validation

Syntax E = cvloss(tree) [E,SE] = cvloss(tree) [E,SE,Nleaf] = cvloss(tree) [E,SE,Nleaf,BestLevel] = cvloss(tree) [E,...] = cvloss(tree,Name,Value)

Description E = cvloss(tree) returns the cross-validated regression error (loss) for tree, a regression tree. [E,SE] = cvloss(tree) returns the standard error of E. [E,SE,Nleaf] = cvloss(tree) returns the number of leaves (terminal nodes) in tree. [E,SE,Nleaf,BestLevel] = cvloss(tree) returns the optimal pruning level for tree. [E,...] = cvloss(tree,Name,Value) cross validates with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments tree — Trained regression tree RegressionTree object Trained regression tree, specified as a RegressionTree object constructed using fitrtree. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Subtrees — Pruning level 0 (default) | vector of nonnegative integers | 'all' Pruning level, specified as the comma-separated pair consisting of 'Subtrees' and a vector of nonnegative integers in ascending order or 'all'. If you specify a vector, then all elements must be at least 0 and at most max(tree.PruneList). 0 indicates the full, unpruned tree and max(tree.PruneList) indicates the completely pruned tree (i.e., just the root node). 32-978

cvloss

If you specify 'all', then cvloss operates on all subtrees (i.e., the entire pruning sequence). This specification is equivalent to using 0:max(tree.PruneList). cvloss prunes tree to each level indicated in Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments. To invoke Subtrees, the properties PruneList and PruneAlpha of tree must be nonempty. In other words, grow tree by setting 'Prune','on', or by pruning tree using prune. Example: 'Subtrees','all' Data Types: single | double | char | string TreeSize — Tree size 'se' (default) | 'min' Tree size, specified as the comma-separated pair consisting of 'TreeSize' and one of the following: • 'se' — cvloss uses the smallest tree whose cost is within one standard error of the minimum cost. • 'min' — cvloss uses the minimal cost tree. KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated tree, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. Example: 'KFold',8

Output Arguments E — Mean squared error scalar value | numeric vector Cross-validation mean squared error (loss), returned as a vector or scalar depending on the setting of the Subtrees name-value pair. SE — Standard error scalar value | numeric vector Standard error of E, returned as vector or scalar depending on the setting of the Subtrees namevalue pair. Nleaf — Number of leaf nodes vector of integer values Number of leaf nodes in tree, returned as a vector or scalar depending on the setting of the Subtrees name-value pair. Leaf nodes are terminal nodes, which give responses, not splits. BestLevel — Best pruning level scalar value Best pruning level as defined in the TreeSize name-value pair, returned as a scalar whose value depends on TreeSize: 32-979

32

Functions

• If TreeSize is 'se', then BestLevel is the largest pruning level that achieves a value of E within SE of the minimum error. • If TreeSize is 'min', then BestLevel is the smallest value in Subtrees.

Examples Compute the Cross-Validation Error Compute the cross-validation error for a default regression tree. Load the carsmall data set. Consider Displacement, Horsepower, and Weight as predictors of the response MPG. load carsmall X = [Displacement Horsepower Weight];

Grow a regression tree using the entire data set. Mdl = fitrtree(X,MPG);

Compute the cross-validation error. rng(1); % For reproducibility E = cvloss(Mdl) E = 27.6976

E is the 10-fold weighted, average MSE (weighted by number of test observations in the folds).

Find the Best Pruning Level Using Cross Validation Apply k-fold cross validation to find the best level to prune a regression tree for all of its subtrees. Load the carsmall data set. Consider Displacement, Horsepower, and Weight as predictors of the response MPG. load carsmall X = [Displacement Horsepower Weight];

Grow a regression tree using the entire data set. View the resulting tree. Mdl = fitrtree(X,MPG); view(Mdl,'Mode','graph')

32-980

cvloss

Compute the 5-fold cross-validation error for each subtree except for the first two lowest and highest pruning level. Specify to return the best pruning level over all subtrees. rng(1); % For reproducibility m = max(Mdl.PruneList) - 1 m = 15 [~,~,~,bestLevel] = cvloss(Mdl,'SubTrees',2:m,'KFold',5) bestLevel = 14

Of the 15 pruning levels, the best pruning level is 14. Prune the tree to the best level. View the resulting tree. MdlPrune = prune(Mdl,'Level',bestLevel); view(MdlPrune,'Mode','graph')

32-981

32

Functions

Alternatives You can construct a cross-validated tree model with crossval, and call kfoldLoss instead of cvloss. If you are going to examine the cross-validated tree more than once, then the alternative can save time. However, unlike cvloss, kfoldLoss does not return SE, Nleaf, or BestLevel.

See Also crossval | fitrtree | kfoldLoss | loss

32-982

cvpartition class

cvpartition class Data partitions for cross validation

Description An object of the cvpartition class defines a random partition on a set of data of a specified size. Use this partition to define test and training sets for validating a statistical model using cross validation.

Construction cvpartition

Create cross-validation partition for data

Methods disp

Display cvpartition object

display

Display cvpartition object

repartition

Repartition data for cross-validation

test

Test indices for cross-validation

training

Training indices for cross-validation

Properties NumObservations Number of observations (including observations with missing group values) NumTestSets

Number of test sets

TestSize

Size of each test set

TrainSize

Size of each training set

Type

Type of partition

Copy Semantics Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes (MATLAB) in the MATLAB Object-Oriented Programming documentation.

Examples Use a 10-fold stratified cross validation to compute the misclassification error for classify on iris data. load('fisheriris'); CVO = cvpartition(species,'k',10); err = zeros(CVO.NumTestSets,1); for i = 1:CVO.NumTestSets

32-983

32

Functions

trIdx = CVO.training(i); teIdx = CVO.test(i); ytest = classify(meas(teIdx,:),meas(trIdx,:),... species(trIdx,:)); err(i) = sum(~strcmp(ytest,species(teIdx))); end cvErr = sum(err)/sum(CVO.TestSize);

See Also crossval Topics “Grouping Variables” on page 2-45

32-984

cvpartition

cvpartition Class: cvpartition Create cross-validation partition for data

Syntax c = cvpartition(n,'KFold',k) c = cvpartition(n,'HoldOut',p) c c c c

= = = =

cvpartition(group,'KFold',k) cvpartition(group,'KFold',k,'Stratify',stratifyOption) cvpartition(group,'HoldOut',p) cvpartition(group,'HoldOut',p,'Stratify',stratifyOption)

c = cvpartition(n,'LeaveOut') c = cvpartition(n,'resubstitution')

Description c = cvpartition(n,'KFold',k) constructs an object c of the cvpartition class on page 32983 defining a random nonstratified partition for k-fold cross-validation on n observations. The partition divides the observations into k disjoint subsamples (or folds), chosen randomly but with roughly equal size. The default value of k is 10. c = cvpartition(n,'HoldOut',p) creates a random nonstratified partition for holdout validation on n observations. This partition divides the observations into a training set and a test (or holdout) set. The parameter p must be a scalar. When 0 < p < 1, cvpartition randomly selects approximately p*n observations for the test set. When p is an integer, cvpartition randomly selects p observations for the test set. The default value of p is 1/10. c = cvpartition(group,'KFold',k) creates a random partition for a stratified k-fold crossvalidation. group is a numeric vector, categorical array, character array, string array, or cell array of character vectors indicating the class of each observation. Each subsample has roughly equal size and roughly the same class proportions as in group. When you supply group as the first input argument to cvpartition, the function creates crossvalidation partitions that do not include rows of observations corresponding to missing values in group. c = cvpartition(group,'KFold',k,'Stratify',stratifyOption) returns an object c defining a random partition for k-fold cross-validation. When you supply group as the first input argument to cvpartition, then the function implements stratification by default. If you also specify 'Stratify',false, then the function creates nonstratified random partitions. You can specify 'Stratify',true only if the first input argument to cvpartition is group. c = cvpartition(group,'HoldOut',p) randomly partitions observations into a training set and a holdout (or test) set with stratification, using the class information in group. Both the training and test sets have roughly the same class proportions as in group. 32-985

32

Functions

c = cvpartition(group,'HoldOut',p,'Stratify',stratifyOption) returns an object c defining a random partition into a training set and a holdout (or test) set. When you supply group as the first input argument to cvpartition, then the function implements stratification by default. If you also specify 'Stratify',false, then the function creates nonstratified random partitions. c = cvpartition(n,'LeaveOut') creates a random partition for leave-one-out cross-validation on n observations. Leave-one-out is a special case of 'KFold' in which the number of folds equals the number of observations. c = cvpartition(n,'resubstitution') creates an object c that does not partition the data. Both the training set and the test set contain all of the original n observations.

Examples Find Misclassification Rate for K-Fold Cross-Validation Use stratified 10-fold cross-validation to compute misclassification rate. Load Fisher's iris data set. load fisheriris; y = species; X = meas;

Create a random partition for a stratified 10-fold cross-validation. c = cvpartition(y,'KFold',10);

Create a function that computes the number of misclassified test samples. fun = @(xTrain,yTrain,xTest,yTest)(sum(~strcmp(yTest,... classify(xTest,xTrain,yTrain))));

Return the estimated misclassification rate using cross-validation. rate = sum(crossval(fun,X,y,'partition',c))... /sum(c.TestSize) rate = 0.0200

Create Nonstratified Partition Find the proportion of each class in a 5-fold nonstratified partition of the fisheriris data. Load Fisher's iris data set. load fisheriris;

Find the number of instances of each class in the data. [C,~,idx] = unique(species); C % Unique classes

32-986

cvpartition

C = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } n = accumarray(idx(:),1) % Number of instances for each class in species n = 3×1 50 50 50

The three classes occur in equal proportion. Create a random, nonstratified 5-fold partition. cv = cvpartition(species,'KFold',5,'Stratify',false) cv = K-fold cross validation partition NumObservations: 150 NumTestSets: 5 TrainSize: 120 120 120 120 TestSize: 30 30 30 30 30

120

Show that the three classes do not occur in equal proportion for each fold of the data set. for i = 1:cv.NumTestSets disp(['Fold ',num2str(i)]) testClasses = species(cv.test(i)); [C,~,idx] = unique(testClasses); C % Unique classes nCount = accumarray(idx(:),1) % Number of instances for each class in a fold end Fold 1 C = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } nCount = 3×1 8 13 9 Fold 2 C = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }

32-987

32

Functions

nCount = 3×1 10 11 9 Fold 3 C = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } nCount = 3×1 10 8 12 Fold 4 C = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } nCount = 3×1 12 8 10 Fold 5 C = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } nCount = 3×1 10 10 10

Because cv is a random nonstratified partition of the fisheriris data, the class proportions in each of the five folds are not guaranteed to be equal to the class proportions in species. That is, the classes do not occur equally in each fold, as they do in species. Cross-validation produces randomness in the results, so your number of instances for each class in a fold can vary from those shown.

32-988

cvpartition

Create Nonstratified Holdout Set for Tall Array Compare the number of instances for each class in a nonstratified holdout set with a stratified holdout set of a tall array. When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. If you want to run the example using the local MATLAB session when you have Parallel Computing Toolbox, you can change the global execution environment by using the mapreducer function. Create a numeric vector of two classes, where class 1 and class 2 occur in the ratio 1:10. group = [ones(20,1);2*ones(200,1)] group = 220×1 1 1 1 1 1 1 1 1 1 1



Create a tall array from group. tgroup = tall(group) Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 12). tgroup = 220x1 tall double column vector 1 1 1 1 1 1 1 1 : :

Holdout is the only cvpartition option that is supported for tall arrays. Create a random, nonstratified holdout partition. CV0 = cvpartition(tgroup,'Holdout',1/4,'Stratify',false) CV0 = Hold-out cross validation partition

32-989

32

Functions

NumObservations: NumTestSets: TrainSize: TestSize:

[1x1 tall] 1 [1x1 tall] [1x1 tall]

Return the result of CV0.test to memory by using the gather function. testIdx0 = gather(CV0.test); Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 8.5 sec Evaluation completed in 8.8 sec

Find the number of times each class occurs in the holdout set. accumarray(group(testIdx0),1) % Number of instances for each class in the holdout set ans = 2×1 5 52

cvpartition produces randomness in the results, so your number of instances for each class can vary from those shown. Because CV0 is a nonstratified partition, class 1 and class 2 in the holdout set are not guaranteed to occur in the same ratio that they do in tgroup. However, because of the inherent randomness in cvpartition, you can sometimes obtain a holdout set for which the classes occur in the same ratio that they do in tgroup even though you specify 'Stratify',false. A similar result can be illustrated for the training set. Return the result of CV0.training to memory. trainIdx0 = gather(CV0.training); Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 2.8 sec Evaluation completed in 2.9 sec

Find the number of times each class occurs in the training set. accumarray(group(trainIdx0),1) % Number of instances for each class in the training set ans = 2×1 15 148

The classes in the nonstratified training set are not guaranteed to occur in the same ratio that they do in tgroup. Create a random, stratified holdout partition. CV1 = cvpartition(tgroup,'Holdout',1/4) CV1 = Hold-out cross validation partition

32-990

cvpartition

NumObservations: NumTestSets: TrainSize: TestSize:

[1x1 tall] 1 [1x1 tall] [1x1 tall]

Return the result of CV1.test to memory. testIdx1 = gather(CV1.test); Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 2.4 sec Evaluation completed in 2.5 sec

Find the number of times each class occurs in the holdout set. accumarray(group(testIdx1),1) % Number of instances for each class in the holdout set ans = 2×1 5 49

In the case of the stratified holdout partition, the class ratio in the holdout set and the class ratio in tgroup are the same (1:10).

Algorithms • If you supply group as the first input argument to cvpartition, the function creates crossvalidation partitions that do not include rows of observations corresponding to missing values in group. • When you supply group as the first input argument to cvpartition, then the function implements stratification by default. You can specify 'Stratify',false to create nonstratified random partitions. • You can specify 'Stratify',true only if the first input argument to cvpartition is group.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function supports tall arrays for out-of-memory data with some limitations. • When you use cvpartition with tall arrays, the first input argument must be a grouping variable, tGroup. If you specify a tall scalar as the first input argument, cvpartition gives an error. • cvpartition supports only Holdout cross-validation for tall arrays; for example, c = cvpartition(tGroup,'HoldOut',p). By default, cvpartition randomly partitions observations into a training set and a test set with stratification, using the class information in tGroup. The parameter p is a scalar such that 0 < p < 1. • To create nonstratified Holdout partitions, specify the value of the 'Stratify' name-value pair argument as false; for example, c = cvpartition(tGroup,'Holdout',p,'Stratify',false). 32-991

32

Functions

For more information, see “Tall Arrays for Out-of-Memory Data” (MATLAB).

See Also crossval | repartition Topics “Grouping Variables” on page 2-45 Introduced in R2008a

32-992

cvshrink

cvshrink Class: ClassificationDiscriminant Cross-validate regularization of linear discriminant

Syntax err = cvshrink(obj) [err,gamma] = cvshrink(obj) [err,gamma,delta] = cvshrink(obj) [err,gamma,delta,numpred] = cvshrink(obj) [err,...] = cvshrink(obj,Name,Value)

Description err = cvshrink(obj) returns a vector of cross-validated classification error values for differing values of the regularization parameter Gamma. [err,gamma] = cvshrink(obj) also returns the vector of Gamma values. [err,gamma,delta] = cvshrink(obj) also returns the vector of Delta values. [err,gamma,delta,numpred] = cvshrink(obj) returns the vector of number of nonzero predictors for each setting of the parameters Gamma and Delta. [err,...] = cvshrink(obj,Name,Value) cross validates with additional options specified by one or more Name,Value pair arguments.

Input Arguments obj Discriminant analysis classifier, produced using fitcdiscr. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. delta • Scalar delta — cvshrink uses this value of delta with every value of gamma for regularization. • Row vector delta — For each i and j, cvshrink uses delta(j) with gamma(i) for regularization. • Matrix delta — The number of rows of delta must equal the number of elements in gamma. For each i and j, cvshrink uses delta(i,j) with gamma(i) for regularization. Default: 0 32-993

32

Functions

gamma Vector of Gamma values for cross-validation. Default: 0:0.1:1 NumDelta Number of Delta intervals for cross-validation. For every value of Gamma, cvshrink cross-validates the discriminant using NumDelta + 1 values of Delta, uniformly spaced from zero to the maximal Delta at which all predictors are eliminated for this value of Gamma. If you set delta, cvshrink ignores NumDelta. Default: 0 NumGamma Number of Gamma intervals for cross-validation. cvshrink cross-validates the discriminant using NumGamma + 1 values of Gamma, uniformly spaced from MinGamma to 1. If you set gamma, cvshrink ignores NumGamma. Default: 10 verbose Verbosity level, an integer from 0 to 2. Higher values give more progress messages. Default: 0

Output Arguments err Numeric vector or matrix of errors. err is the misclassification error rate, meaning the average fraction of misclassified data over all folds. • If delta is a scalar (default), err(i) is the misclassification error rate for obj regularized with gamma(i). • If delta is a vector, err(i,j) is the misclassification error rate for obj regularized with gamma(i) and delta(j). • If delta is a matrix, err(i,j) is the misclassification error rate for obj regularized with gamma(i) and delta(i,j). gamma Vector of Gamma values used for regularization. See “Gamma and Delta” on page 32-996. delta Vector or matrix of Delta values used for regularization. See “Gamma and Delta” on page 32-996. • If you give a scalar for the delta name-value pair, the output delta is a row vector the same size as gamma, with entries equal to the input scalar. 32-994

cvshrink

• If you give a row vector for the delta name-value pair, the output delta is a matrix with the same number of columns as the row vector, and with the number of rows equal to the number of elements of gamma. The output delta(i,j) is equal to the input delta(j). • If you give a matrix for the delta name-value pair, the output delta is the same as the input matrix. The number of rows of delta must equal the number of elements in gamma. numpred Numeric vector or matrix containing the number of predictors in the model at various regularizations. numpred has the same size as err. • If delta is a scalar (default), numpred(i) is the number of predictors for obj regularized with gamma(i) and delta. • If delta is a vector, numpred(i,j) is the number of predictors for obj regularized with gamma(i) and delta(j). • If delta is a matrix, numpred(i,j) is the number of predictors for obj regularized with gamma(i) and delta(i,j).

Examples Regularize Data with Many Predictors Regularize a discriminant analysis classifier, and view the tradeoff between the number of predictors in the model and the classification accuracy. Create a linear discriminant analysis classifier for the ovariancancer data. Set the SaveMemory and FillCoeffs options to keep the resulting model reasonably small. load ovariancancer obj = fitcdiscr(obs,grp,... 'SaveMemory','on','FillCoeffs','off');

Use 10 levels of Gamma and 10 levels of Delta to search for good parameters. This search is timeconsuming. Set Verbose to 1 to view the progress. rng('default') % for reproducibility [err,gamma,delta,numpred] = cvshrink(obj,... 'NumGamma',9,'NumDelta',9,'Verbose',1); Done building cross-validated model. Processing Gamma step 1 out of 10. Processing Gamma step 2 out of 10. Processing Gamma step 3 out of 10. Processing Gamma step 4 out of 10. Processing Gamma step 5 out of 10. Processing Gamma step 6 out of 10. Processing Gamma step 7 out of 10. Processing Gamma step 8 out of 10. Processing Gamma step 9 out of 10. Processing Gamma step 10 out of 10.

Plot the classification error rate against the number of predictors. 32-995

32

Functions

plot(err,numpred,'k.') xlabel('Error rate'); ylabel('Number of predictors');

More About Gamma and Delta Regularization is the process of finding a small set of predictors that yield an effective predictive model. For linear discriminant analysis, there are two parameters, γ and δ, that control regularization as follows. cvshrink helps you select appropriate values of the parameters. Let Σ represent the covariance matrix of the data X, and let X be the centered data (the data X minus the mean by class). Define T

D = diag X * X . The regularized covariance matrix Σ is Σ = 1 − γ Σ + γD . Whenever γ ≥ MinGamma, Σ is nonsingular. 32-996

cvshrink

Let μk be the mean vector for those elements of X in class k, and let μ0 be the global mean vector (the mean of the rows of X). Let C be the correlation matrix of the data X, and let C be the regularized correlation matrix: C = 1 − γ C + γI, where I is the identity matrix. The linear term in the regularized discriminant analysis classifier for a data point x is T −1

x − μ0 Σ

T

μk − μ0 = x − μ0 D−1/2 C

−1 −1/2

D

μk − μ0 .

The parameter δ enters into this equation as a threshold on the final term in square brackets. Each component of the vector C

−1 −1/2

D

μk − μ0 is set to zero if it is smaller in magnitude than the

threshold δ. Therefore, for class k, if component j is thresholded to zero, component j of x does not enter into the evaluation of the posterior probability. The DeltaPredictor property is a vector related to this threshold. When δ ≥ DeltaPredictor(i), all classes k have C

−1 −1/2

D

μk − μ0 ≤ δ .

Therefore, when δ ≥ DeltaPredictor(i), the regularized classifier does not use predictor i.

Tips • Examine the err and numpred outputs to see the tradeoff between cross-validated error and number of predictors. When you find a satisfactory point, set the corresponding gamma and delta properties in the model using dot notation. For example, if (i,j) is the location of the satisfactory point, set obj.Gamma = gamma(i); obj.Delta = delta(i,j);

See Also ClassificationDiscriminant | fitcdiscr Topics “Regularize Discriminant Analysis Classifier” on page 20-21 “Discriminant Analysis Classification” on page 20-2

32-997

32

Functions

cvshrink Cross validate shrinking (pruning) ensemble

Syntax vals = cvshrink(ens) [vals,nlearn] = cvshrink(ens) [vals,nlearn] = cvshrink(ens,Name,Value)

Description vals = cvshrink(ens) returns an L-by-T matrix with cross-validated values of the mean squared error. L is the number of lambda values in the ens.Regularization structure. T is the number of threshold values on weak learner weights. If ens does not have a Regularization property filled in by the regularize method, pass a lambda name-value pair. [vals,nlearn] = cvshrink(ens) returns an L-by-T matrix of the mean number of learners in the cross-validated ensemble. [vals,nlearn] = cvshrink(ens,Name,Value) cross validates with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments ens A regression ensemble, created with fitrensemble. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. cvpartition A partition created with cvpartition to use in a cross-validated tree. You can only use one of these four options at a time: 'kfold', 'holdout', 'leaveout', or 'cvpartition'. holdout Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. Specify a numeric scalar from 0 to 1. You can only use one of these four options at a time for creating a cross-validated tree: 'kfold', 'holdout', 'leaveout', or 'cvpartition'. kfold Number of folds to use in a cross-validated tree, a positive integer. If you do not supply a crossvalidation method, cvshrink uses 10-fold cross validation. You can only use one of these four options at a time: 'kfold', 'holdout', 'leaveout', or 'cvpartition'. 32-998

cvshrink

Default: 10 lambda Vector of nonnegative regularization parameter values for lasso. If empty, cvshrink does not perform cross validation. Default: [] leaveout Use leave-one-out cross validation by setting to 'on'. You can only use one of these four options at a time: 'kfold', 'holdout', 'leaveout', or 'cvpartition'. threshold Numeric vector with lower cutoffs on weights for weak learners. cvshrink discards learners with weights below threshold in its cross-validation calculation. Default: 0

Output Arguments vals L-by-T matrix with cross-validated values of the mean squared error. L is the number of values of the regularization parameter 'lambda', and T is the number of 'threshold' values on weak learner weights. nlearn L-by-T matrix with cross-validated values of the mean number of learners in the cross-validated ensemble.L is the number of values of the regularization parameter 'lambda', and T is the number of 'threshold' values on weak learner weights.

Examples Cross-Validate Regression Ensemble Create a regression ensemble for predicting mileage from the carsmall data. Cross-validate the ensemble. Load the carsmall data set and select displacement, horsepower, and vehicle weight as predictors. load carsmall X = [Displacement Horsepower Weight];

You can train an ensemble of bagged regression trees. ens = fitrensemble(X,Y,'Method','Bag')

fircensemble uses a default template tree object templateTree() as a weak learner when 'Method' is 'Bag'. In this example, for reproducibility, specify 'Reproducible',true when you create a tree template object, and then use the object as a weak learner. 32-999

32

Functions

rng('default') % For reproducibility t = templateTree('Reproducible',true); % For reproducibiliy of random predictor selections ens = fitrensemble(X,MPG,'Method','Bag','Learners',t);

Specify values for lambda and threshold. Use these values to cross-validate the ensemble. [vals,nlearn] = cvshrink(ens,'lambda',[.01 .1 1],'threshold',[0 .01 .1]) vals = 3×3 18.1135 18.1140 18.0823

18.4634 18.4630 18.3565

115.5087 115.4477 124.1655

11.5000 11.4000 11.3000

3.5000 3.5000 3.3000

nlearn = 3×3 13.8000 13.7000 13.8000

Clearly, setting a threshold of 0.1 leads to unacceptable errors, while a threshold of 0.01 gives similar errors to a threshold of 0. The mean number of learners with a threshold of 0.01 is about 11.4, whereas the mean number is about 13.8 when the threshold is 0.

See Also regularize | shrink

32-1000

datasample

datasample Randomly sample from data, with or without replacement

Syntax y = datasample(data,k) y = datasample(data,k,dim) y = datasample( ___ ,Name,Value) y = datasample(s, ___ ) [y,idx] = datasample( ___ )

Description y = datasample(data,k) returns k observations sampled uniformly at random, with replacement, from the data in data. y = datasample(data,k,dim) returns a sample taken along dimension dim of data. y = datasample( ___ ,Name,Value) returns a sample for any of the input arguments in the previous syntaxes, with additional options specified by one or more name-value pair arguments. For example, 'Replace',false specifies sampling without replacement. y = datasample(s, ___ ) uses the random number stream s to generate random numbers. The option s can precede any of the input arguments in the previous syntaxes. [y,idx] = datasample( ___ ) also returns an index vector indicating which values datasample sampled from data using any of the input arguments in the previous syntaxes.

Examples Sample Unique Values from Vector Create the random number stream for reproducibility. s = RandStream('mlfg6331_64');

Draw five unique values from the integers 1 to 10. y = datasample(s,1:10,5,'Replace',false) y = 1×5 9

8

3

6

2

Generate Random Characters for Specified Probabilities Create the random number stream for reproducibility. 32-1001

32

Functions

s = RandStream('mlfg6331_64');

Generate 48 random characters from the sequence ACGT per specified probabilities. seq = datasample(s,'ACGT',48,'Weights',[0.15 0.35 0.35 0.15]) seq = 'GGCGGCGCAAGGCGCCGGACCTGGCTGCACGCCGTTCCCTGCTACTCG'

Select Random Subset of Matrix Columns Set the random seed for reproducibility of the results. rng(10,'twister')

Generate a matrix with 10 rows and 1000 columns. X = randn(10,1000);

Create the random number stream for reproducibility within datasample. s = RandStream('mlfg6331_64');

Randomly select five unique columns from X. Y = datasample(s,X,5,2,'Replace',false) Y = 10×5 0.4317 0.6977 -0.8543 0.1686 -1.7649 -0.3821 -1.6844 -0.4170 -0.2410 0.6212

-0.3327 -0.7422 -0.3105 0.6609 -1.1607 0.5696 0.7148 1.3696 1.4703 1.4118

0.9112 0.4578 0.9836 -0.0553 -0.3513 -1.6264 -0.6876 1.1874 -2.5003 -0.4518

-2.3244 -1.3745 -0.6434 -0.1202 -1.5533 -0.2104 -0.4447 -0.9901 -1.1321 0.8697

0.9559 -0.8634 -0.4457 -1.3699 0.0597 -1.5486 -1.4615 0.5875 -1.8451 0.8093

Create a Bootstrap Replicate Data Set Resample observations from a dataset array to create a bootstrap replicate data set. See “Bootstrap Resampling” on page 3-14 for more information about bootstrapping. Load the sample data set. load hospital

Create a data set that has the same size as the hospital data set and contains random samples chosen with replacement from the hospital data set. y = datasample(hospital,size(hospital,1));

32-1002

datasample

Sample in Parallel from Two Data Vectors Select samples from data based on indices of a sample chosen from another vector. Generate two random vectors. x1 = randn(100,1); x2 = randn(100,1);

Select a sample of 10 elements from vector x1, and return the indices of the sample in vector idx. [y1,idx] = datasample(x1,10);

Select a sample of 10 elements from vector x2 using the indices in vector idx. y2 = x2(idx);

Input Arguments data — Input data vector | matrix | multidimensional array | table | dataset array Input data from which to sample, specified as a vector, matrix, multidimensional array, table, or dataset array. By default, datasample samples from the first nonsingleton dimension of data. For example, if data is a matrix, then datasample samples from the rows. Change this behavior with the dim input argument. Data Types: single | double | logical | char | string | table k — Number of samples positive integer Number of samples, specified as a positive integer. Example: datasample(data,100) returns 100 observations sampled uniformly and at random from the data in data. Data Types: single | double dim — Dimension to sample 1 (default) | positive integer Dimension to sample, specified as a positive integer. For example, if data is a matrix and dim is 2, y contains a selection of columns in data. If data is a table or dataset array and dim is 2, y contains a selection of variables in data. Use dim to ensure sampling along a specific dimension regardless of whether data is a vector, matrix, or N-dimensional array. Data Types: single | double s — Random number stream global stream (default) | RandStream Random number stream, specified as the global stream or RandStream. For example, s = RandStream('mlfg6331_64') creates a random number stream that uses the multiplicative lagged 32-1003

32

Functions

Fibonacci generator algorithm. For details, see “Creating and Controlling a Random Number Stream” (MATLAB). The rng function provides a simple way to control the global stream. For example, rng(seed) seeds the random number generator using the nonnegative integer seed. For details, see “Managing the Global Stream” (MATLAB). Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Replace',false,'Weights',ones(datasize,1) samples without replacement and with probability proportional to the elements of Weights, where datasize is the size of the dimension being sampled. Replace — Indicator for sampling with replacement true (default) | false Indicator for sampling with replacement, specified as the comma-separated pair consisting of 'Replace' and either true or false. Sample with replacement if 'Replace' is true, or without replacement if 'Replace' is false. If 'Replace' is false, then k must not be larger than the size of the dimension being sampled. For example, if data = [1 3 Inf; 2 4 5] and y = datasample(data,k,'Replace',false), then k cannot be larger than 2. Data Types: logical Weights — Sampling weights ones(datasize,1) (default) | vector of nonnegative numeric values Sampling weights, specified as the comma-separated pair consisting of 'Weights' and a vector of nonnegative numeric values. The vector is of size datasize, where datasize is the size of the dimension being sampled. The vector must have at least one positive value and cannot contain NaN values. The datasample function samples with probability proportional to the elements of 'Weights'. Example: 'Weights',[0.1 0.5 0.35 0.46] Data Types: single | double

Output Arguments y — Sample vector | matrix | multidimensional array | table | dataset array Sample, returned as a vector, matrix, multidimensional array, table, or dataset array. • If data is a vector, then y is a vector containing k elements selected from data. • If data is a matrix and dim = 1, then y is a matrix containing k rows selected from data. Or, if dim = 2, then y is a matrix containing k columns selected from data.

32-1004

datasample

• If data is an N-dimensional array and dim = 1, then y is an N-dimensional array of samples taken along the first nonsingleton dimension of data. Or, if you specify a value for the dim name-value pair argument, datasample samples along the dimension dim. • If data is a table and dim = 1, then y is a table containing k rows selected from data. Or, if dim = 2, then y is a table containing k variables selected from data. • If data is a dataset array and dim = 1, then y is a dataset array containing k rows selected from data. Or, if dim = 2, then y is a dataset array containing k variables selected from data. If the input data contains missing observations that are represented as NaN values, datasample samples from the entire input, including the NaN values. For example, y = datasample([NaN 6 14],2) can return y = NaN 14. When the sample is taken with replacement (default), y can contain repeated observations from data. Set the Replace name-value pair argument to false to sample without replacement. idx — Indices vector Indices, returned as a vector indicating which elements datasample chooses from data to create y. For example: • If data is a vector, then y = data(idx). • If data is a matrix and dim = 1, then y = data(idx,:). • If data is a matrix and dim = 2, then y = data(:,idx).

Tips • To sample random integers with replacement from a range, use randi. • To sample random integers without replacement, use randperm or datasample. • To randomly sample from data, with or without replacement, use datasample.

Algorithms datasample uses randperm, rand, or randi to generate random values. Therefore, datasample changes the state of the MATLAB global random number generator. Control the random number generator using rng. For selecting weighted samples without replacement, datasample uses the algorithm of Wong and Easton [1].

Alternative Functionality You can use randi or randperm to generate indices for random sampling with or without replacement, respectively. However, datasample can be more convenient to use because it samples directly from your data. datasample also allows weighted sampling.

References [1] Wong, C. K. and M. C. Easton. An Efficient Method for Weighted Sampling Without Replacement. SIAM Journal of Computing 9(1), pp. 111–113, 1980. 32-1005

32

Functions

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function supports tall arrays for out-of-memory data with some limitations. • datasample is useful as a precursor to plotting and fitting a random subset of a large data set. Sampling a large data set preserves trends in the data without requiring the use of all the data points. If the sample is small enough to fit in memory, then you can apply plotting and fitting functions that do not directly support tall arrays. • datasample supports sampling only along the first dimension of the data. • For tall arrays, datasample does not support sampling with replacement. You must specify 'Replace',false, for example, datasample(data,k,'Replace',false). • The value of 'Weights' must be a numeric tall array of the same height as data. • For the syntax [Y,idx] = datasample(___), the output idx is a tall logical vector of the same height as data. The vector indicates whether each data point is included in the sample. • If you specify a random number stream, then the underlying generator must support multiple streams and substreams. If you do not specify a random number stream, then datasample uses the stream controlled by tallrng. For more information, see “Tall Arrays for Out-of-Memory Data” (MATLAB).

See Also RandStream | rand | randi | randperm | rng | tallrng Introduced in R2011b

32-1006

dataset class

dataset class (Not Recommended) Arrays for statistical data Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Description Dataset arrays are used to collect heterogeneous data and metadata including variable and observation names into a single container variable. Dataset arrays are suitable for storing columnoriented or tabular data that are often stored as columns in a text file or in a spreadsheet, and can accommodate variables of different types, sizes, units, etc. Dataset arrays can contain different kinds of variables, including numeric, logical, character, string, categorical, and cell. However, a dataset array is a different class than the variables that it contains. For example, even a dataset array that contains only variables that are double arrays cannot be operated on as if it were itself a double array. However, using dot subscripting, you can operate on variable in a dataset array as if it were a workspace variable. You can subscript dataset arrays using parentheses much like ordinary numeric arrays, but in addition to numeric and logical indices, you can use variable and observation names as indices.

Construction Use the dataset constructor to create a dataset array from variables in the MATLAB workspace. You can also create a dataset array by reading data from a text or spreadsheet file. You can access each variable in a dataset array much like fields in a structure, using dot subscripting. See the following section for a list of operations available for dataset arrays. dataset

(Not Recommended) Construct dataset array

32-1007

32

Functions

Methods cat

(Not Recommended) Concatenate dataset arrays

cellstr

(Not Recommended) Create cell array of character vectors from dataset array

dataset2cell

(Not Recommended) Convert dataset array to cell array

dataset2struct

(Not Recommended) Convert dataset array to structure

datasetfun

(Not Recommended) Apply function to dataset array variables

disp

(Not Recommended) Display dataset array

display

(Not Recommended) Display dataset array

double

(Not Recommended) Convert dataset variables to double array

end

(Not Recommended) Last index in indexing expression for dataset array

export

(Not Recommended) Write dataset array to file

get

(Not Recommended) Access dataset array properties

horzcat

(Not Recommended) Horizontal concatenation for dataset arrays

intersect

(Not Recommended) Set intersection for dataset array observations

isempty

(Not Recommended) True for empty dataset array

ismember

(Not Recommended) Dataset array elements that are members of set

ismissing

(Not Recommended) Find dataset array elements with missing values

join

(Not Recommended) Merge observations

length

(Not Recommended) Length of dataset array

ndims

(Not Recommended) Number of dimensions of dataset array

numel

(Not Recommended) Number of elements in dataset array

replacedata

(Not Recommended) Replace dataset variables

replaceWithMissing (Not Recommended) Insert missing data indicators into a dataset array

32-1008

set

(Not Recommended) Set and display properties

setdiff

(Not Recommended) Set difference for dataset array observations

setxor

(Not Recommended) Set exclusive or for dataset array observations

single

(Not Recommended) Convert dataset variables to single array

size

(Not Recommended) Size of dataset array

sortrows

(Not Recommended) Sort rows of dataset array

stack

(Not Recommended) Stack data from multiple variables into single variable

subsasgn

(Not Recommended) Subscripted assignment to dataset array

subsref

(Not Recommended) Subscripted reference for dataset array

summary

(Not Recommended) Print summary of dataset array

union

(Not Recommended) Set union for dataset array observations

unique

(Not Recommended) Unique observations in dataset array

unstack

(Not Recommended) Unstack data from single variable into multiple variables

vertcat

(Not Recommended) Vertical concatenation for dataset arrays

dataset class

Properties A dataset array D has properties that store metadata (information about your data). Access or assign to a property using P = D.Properties.PropName or D.Properties.PropName = P, where PropName is one of the following: Description

(Not Recommended) Character vector describing data set

DimNames

(Not Recommended) Two-element cell array of character vectors giving names of dimensions of data set

ObsNames

(Not Recommended) Cell array of nonempty, distinct character vectors giving names of observations in data set

Units

(Not Recommended) Units of variables in data set

UserData

(Not Recommended) Variable containing additional information associated with data set

VarDescription (Not Recommended) Cell array of character vectors giving descriptions of variables in data set VarNames

(Not Recommended) Cell array giving names of variables in data set

Copy Semantics Value. To learn how this affects your use of the class, see Comparing Handle and Value Classes (MATLAB) in the MATLAB Object-Oriented Programming documentation.

Examples Load a dataset array from a .mat file and create some simple subsets: load hospital h1 = hospital(1:10,:) h2 = hospital(:,{'LastName' 'Age' 'Sex' 'Smoker'}) % Access and modify metadata hospital.Properties.Description hospital.Properties.VarNames{4} = 'Wgt' % Create a new dataset variable from an existing one hospital.AtRisk = hospital.Smoker | (hospital.Age > 40) % Use individual variables to explore the data boxplot(hospital.Age,hospital.Sex) h3 = hospital(hospital.Age 500,:),'file','HighScores.txt') B = dataset('file','HighScores.txt','delimiter','\t') B = Test Gender Score 'Verbal' 'Female' 530 'Quantitative' 'Male' 520

32-1258

export

See Also dataset

32-1259

32

Functions

exppdf Exponential probability density function

Syntax y = exppdf(x) y = exppdf(x,mu)

Description y = exppdf(x) returns the probability density function (pdf) of the standard exponential distribution, evaluated at the values in x. y = exppdf(x,mu) returns the pdf of the exponential distribution with mean mu, evaluated at the values in x.

Examples Compute Exponential pdf Compute the density of the observed value 5 in the standard exponential distribution. y1 = exppdf(5) y1 = 0.0067

Compute the density of the observed value 5 in the exponential distributions specified by means 1 through 5. y2 = exppdf(5,1:5) y2 = 1×5 0.0067

0.0410

0.0630

0.0716

0.0736

Compute the density of the observed values 1 through 5 in the exponential distributions specified by means 1 through 5, respectively. y3 = exppdf(1:5,1:5) y3 = 1×5 0.3679

0.1839

0.1226

0.0920

0.0736

Input Arguments x — Values at which to evaluate pdf nonnegative scalar value | array of nonnegative scalar values 32-1260

exppdf

Values at which to evaluate the pdf, specified as a nonnegative scalar value or an array of nonnegative scalar values. • To evaluate the pdf at multiple values, specify x using an array. • To evaluate the pdfs of multiple distributions, specify mu using an array. If either or both of the input arguments x and mu are arrays, then the array sizes must be the same. In this case, exppdf expands each scalar input into a constant array of the same size as the array inputs. Each element in y is the pdf value of the distribution specified by the corresponding element in mu, evaluated at the corresponding element in x. Example: [3 4 7 9] Data Types: single | double mu — mean 1 (default) | positive scalar value | array of positive scalar values Mean of the exponential distribution, specified as a positive scalar value or an array of positive scalar values. • To evaluate the pdf at multiple values, specify x using an array. • To evaluate the pdfs of multiple distributions, specify mu using an array. If either or both of the input arguments x and mu are arrays, then the array sizes must be the same. In this case, exppdf expands each scalar input into a constant array of the same size as the array inputs. Each element in y is the pdf value of the distribution specified by the corresponding element in mu, evaluated at the corresponding element in x. Example: [1 2 3 5] Data Types: single | double

Output Arguments y — pdf values scalar value | array of scalar values pdf values evaluated at the values in x, returned as a scalar value or an array of scalar values. y is the same size as x and mu after any necessary scalar expansion. Each element in y is the pdf value of the distribution specified by the corresponding element in mu, evaluated at the corresponding element in x.

More About Exponential pdf The exponential distribution is a one-parameter family of curves. The parameter μ is the mean. The pdf of the exponential distribution is y = f (x μ) =

1 −x eμ . μ 32-1261

32

Functions

A common alternative parameterization of the exponential distribution is to use λ defined as the mean number of events in an interval as opposed to μ, which is the mean wait time for an event to occur. λ and μ are reciprocals. For more information, see “Exponential Distribution” on page B-33.

Alternative Functionality • exppdf is a function specific to the exponential distribution. Statistics and Machine Learning Toolbox also offers the generic function pdf, which supports various probability distributions. To use pdf, create an ExponentialDistribution probability distribution object and pass the object as an input argument or specify the probability distribution name and its parameters. Note that the distribution-specific function exppdf is faster than the generic function pdf. • Use the Probability Distribution Function app to create an interactive plot of the cumulative distribution function (cdf) or probability density function (pdf) for a probability distribution.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ExponentialDistribution | expcdf | expfit | expinv | explike | exprnd | expstat | pdf Topics “Exponential Distribution” on page B-33 Introduced before R2006a

32-1262

exprnd

exprnd Exponential random numbers

Syntax r = exprnd(mu) r = exprnd(mu,sz1,...,szN) r = exprnd(mu,sz)

Description r = exprnd(mu) generates a random number from the exponential distribution with mean mu. r = exprnd(mu,sz1,...,szN) generates an array of random numbers from the exponential distribution, where sz1,...,szN indicates the size of each dimension. r = exprnd(mu,sz) generates an array of random numbers from the exponential distribution, where vector sz specifies size(r).

Examples Generate Exponential Random Number Generate a single random number from the exponential distribution with mean 5. r = exprnd(5) r = 1.0245

Generate Array of Exponential Random Numbers Generate a 1-by-6 array of exponential random numbers with unit mean. mu1 = ones(1,6); % 1-by-6 array of ones r1 = exprnd(mu1) r1 = 1×6 0.2049

0.0989

2.0637

0.0906

0.4583

2.3275

By default, exprnd generates an array that is the same size as mu. If you specify mu as a scalar, then exprnd expands it into a constant array with dimensions specified by sz1,...,szn. Generate a 2-by-6 array of exponential random numbers with mean 3. 32-1263

32

Functions

mu2 = 3; sz1 = 2; sz2 = 6; r2 = exprnd(mu2,sz1,sz2) r2 = 2×6 3.8350 1.8106

0.1303 0.1072

5.5428 0.0895

0.1313 2.1685

0.6684 5.8582

2.5899 0.2641

If you specify both mu and sz1,...,szn as arrays, then the dimensions specified by sz1,...,szn must match the dimension of mu. Generate a 1-by-6 array of exponential random numbers with means 5 through 10. mu3 = 5:10; sz = [1 6]; r3 = exprnd(mu3,sz) r3 = 1×6 1.1647

0.2481

2.9539

26.6582

1.4719

0.6829

Input Arguments mu — mean 1 (default) | positive scalar value | array of positive scalar values Mean of the exponential distribution, specified as a positive scalar value or an array of positive scalar values. To generate random numbers from multiple distributions, specify mu using an array. Each element in r is the random number generated from the distribution specified by the corresponding element in mu. Example: [1 2 3 5] Data Types: single | double sz1,...,szN — Size of each dimension (as separate arguments) integers Size of each dimension, specified as separate arguments of integers. If mu is an array, then the specified dimensions sz1,...,szN must match the dimensions of mu. The default values of sz1,...,szN are the dimensions of mu. • If you specify a single value sz1, then r is a square matrix of size sz1-by-sz1. • If the size of any dimension is 0 or negative, then r is an empty array. • Beyond the second dimension, exprnd ignores trailing dimensions with a size of 1. For example, exprnd(4,3,1,1,1) produces a 3-by-1 vector of random numbers from the distribution with mean 4. Example: 2,4 32-1264

exprnd

Data Types: single | double sz — Size of each dimension (as a row vector) row vector of integers Size of each dimension, specified as a row vector of integers. If mu is an array, then the specified dimensions sz must match the dimensions of mu. The default values of sz are the dimensions of mu. • If you specify a single value [sz1], then r is a square matrix of size sz1-by-sz1. • If the size of any dimension is 0 or negative, then r is an empty array. • Beyond the second dimension, exprnd ignores trailing dimensions with a size of 1. For example, exprnd(4,[3 1 1 1]) produces a 3-by-1 vector of random numbers from the distribution with mean 4. Example: [2 4] Data Types: single | double

Output Arguments r — Exponential random numbers nonnegative scalar value | array of nonnegative scalar values Exponential random numbers, returned as a nonnegative scalar value or an array of nonnegative scalar values with the dimensions specified by sz1,...,szN or sz. Each element in r is the random number generated from the distribution specified by the corresponding element in mu.

Alternative Functionality • exprnd is a function specific to the exponential distribution. Statistics and Machine Learning Toolbox also offers the generic function random, which supports various probability distributions. To use random, create an ExponentialDistribution probability distribution object and pass the object as an input argument or specify the probability distribution name and its parameters. Note that the distribution-specific function exprnd is faster than the generic function random. • To generate random numbers interactively, use randtool, a user interface for random number generation.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The generated code can return a different sequence of numbers from the sequence returned by MATLAB if either of the following is true: • The output is nonscalar. • An input parameter is invalid for the distribution. 32-1265

32

Functions

For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ExponentialDistribution | expcdf | expfit | expinv | explike | exppdf | expstat | random Topics “Exponential Distribution” on page B-33 Introduced before R2006a

32-1266

expstat

expstat Exponential mean and variance

Syntax [m,v] = expstat(mu)

Description [m,v] = expstat(mu) returns the mean of and variance for the exponential distribution with parameters mu. mu can be a vectors, matrix, or multidimensional array. The mean of the exponential distribution is µ, and the variance is µ2.

Examples [m,v] = expstat([1 10 100 1000]) m = 1 10 100 1000 v = 1 100 10000 1000000

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also expcdf | expfit | expinv | explike | exppdf | exprnd | expstat Topics “Exponential Distribution” on page B-33 Introduced before R2006a

32-1267

32

Functions

factoran Factor analysis

Syntax lambda = factoran(X,m) [lambda,psi] = factoran(X,m) [lambda,psi,T] = factoran(X,m) [lambda,psi,T,stats] = factoran(X,m) [lambda,psi,T,stats,F] = factoran(X,m) [...] = factoran(...,param1,val1,param2,val2,...)

Description lambda = factoran(X,m) returns the maximum likelihood estimate, lambda, of the factor loadings matrix, in a common factor analysis model with m common factors. X is an n-by-d matrix where each row is an observation of d variables. The (i,j)th element of the d-by-m matrix lambda is the coefficient, or loading, of the jth factor for the ith variable. By default, factoran calls the function rotatefactors to rotate the estimated factor loadings using the 'varimax' option. [lambda,psi] = factoran(X,m) also returns maximum likelihood estimates of the specific variances as a column vector psi of length d. [lambda,psi,T] = factoran(X,m) also returns the m-by-m factor loadings rotation matrix T. [lambda,psi,T,stats] = factoran(X,m) also returns a structure stats containing information relating to the null hypothesis, H0, that the number of common factors is m. stats includes the following fields: Field

Description

loglike

Maximized log-likelihood value

dfe

Error degrees of freedom = ((d-m)^2 - (d+m))/2

chisq

Approximate chi-squared statistic for the null hypothesis

p

Right-tail significance level for the null hypothesis

factoran does not compute the chisq and p fields unless dfe is positive and all the specific variance estimates in psi are positive (see “Heywood Case” on page 32-1276 below). If X is a covariance matrix, then you must also specify the 'nobs' parameter if you want factoran to compute the chisq and p fields. [lambda,psi,T,stats,F] = factoran(X,m) also returns, in F, predictions of the common factors, known as factor scores. F is an n-by-m matrix where each row is a prediction of m common factors. If X is a covariance matrix, factoran cannot compute F. factoran rotates F using the same criterion as for lambda. [...] = factoran(...,param1,val1,param2,val2,...) enables you to specify optional parameter name/value pairs to control the model fit and the outputs. The following are the valid parameter/value pairs. 32-1268

factoran

Parameter

Value

'xtype'

Type of input in the matrix X. 'xtype' can be one of:

'scores'

'start'

'rotate'

'data'

Raw data (default)

'covariance'

Positive definite covariance or correlation matrix

Method for predicting factor scores. 'scores' is ignored if X is not raw data. 'wls' 'Bartlett'

Synonyms for a weighted least-squares estimate that treats F as fixed (default)

'regression' 'Thomson'

Synonyms for a minimum mean squared error prediction that is equivalent to a ridge regression

Starting point for the specific variances psi in the maximum likelihood optimization. Can be specified as: 'random'

Chooses d uniformly distributed values on the interval [0,1].

'Rsquared'

Chooses the starting vector as a scale factor times diag(inv(corrcoef(X))) (default). For examples, see Jöreskog [2].

Positive integer

Performs the given number of maximum likelihood fits, each initialized as with 'random'. factoran returns the fit with the highest likelihood.

Matrix

Performs one maximum likelihood fit for each column of the specified matrix. The ith optimization is initialized with the values from the ith column. The matrix must have d rows.

Method used to rotate factor loadings and scores. 'rotate' can have the same values as the 'Method' parameter of rotatefactors. See the reference page for rotatefactors for a full description of the available methods. 'none'

Performs no rotation.

'equamax'

Special case of the orthomax rotation. Use the 'normalize', 'reltol', and 'maxit' parameters to control the details of the rotation.

'orthomax'

Orthogonal rotation that maximizes a criterion based on the variance of the loadings. Use the 'coeff', 'normalize', 'reltol', and 'maxit' parameters to control the details of the rotation.

'parsimax'

Special case of the orthomax rotation (default). Use the 'normalize', 'reltol', and 'maxit' parameters to control the details of the rotation.

32-1269

32

Functions

Parameter

Value 'pattern'

Performs either an oblique rotation (the default) or an orthogonal rotation to best match a specified pattern matrix. Use the 'type' parameter to choose the type of rotation. Use the 'target' parameter to specify the pattern matrix.

'procrustes'

Performs either an oblique (the default) or an orthogonal rotation to best match a specified target matrix in the least squares sense. Use the 'type' parameter to choose the type of rotation. Use 'target' to specify the target matrix.

'promax'

Performs an oblique procrustes rotation to a target matrix determined by factoran as a function of an orthomax solution. Use the 'power' parameter to specify the exponent for creating the target matrix. Because 'promax' uses 'orthomax' internally, you can also specify the parameters that apply to 'orthomax'.

'quartimax'

Special case of the orthomax rotation (default). Use the 'normalize', 'reltol', and 'maxit' parameters to control the details of the rotation.

'varimax'

Special case of the orthomax rotation (default). Use the 'normalize', 'reltol', and 'maxit' parameters to control the details of the rotation.

Function

Function handle to rotation function of the form [B,T] = myrotation(A,...)

where A is a d-by-m matrix of unrotated factor loadings, B is a d-by-m matrix of rotated loadings, and T is the corresponding m-by-m rotation matrix. Use the factoran parameter 'userargs' to pass additional arguments to this rotation function. See “User-Defined Rotation Function” on page 32-1275.

32-1270

'coeff'

Coefficient, often denoted as γ, defining the specific 'orthomax' criterion. Must be from 0 to 1. The value 0 corresponds to quartimax, and 1 corresponds to varimax. Default is 1.

'normalize'

Flag indicating whether the loading matrix should be row-normalized (1) or left unnormalized (0) for 'orthomax' or 'varimax' rotation. Default is 1.

'reltol'

Relative convergence tolerance for 'orthomax' or 'varimax' rotation. Default is sqrt(eps).

'maxit'

Iteration limit for 'orthomax' or 'varimax' rotation. Default is 250.

factoran

Parameter

Value

'target'

Target factor loading matrix for 'procrustes' rotation. Required for 'procrustes' rotation. No default value.

'type'

Type of 'procrustes' rotation. Can be 'oblique' (default) or 'orthogonal'.

'power'

Exponent for creating the target matrix in the 'promax' rotation. Must be ≥ 1. Default is 4.

'userargs'

Denotes the beginning of additional input values for a user-defined rotation function. factoran appends all subsequent values, in order and without processing, to the rotation function argument list, following the unrotated factor loadings matrix A. See “User-Defined Rotation Function” on page 32-1275.

'nobs'

If X is a covariance or correlation matrix, indicates the number of observations that were used in its estimation. This allows calculation of significance for the null hypothesis even when the original data are not available. There is no default. 'nobs' is ignored if X is raw data.

'delta'

Lower bound for the specific variances psi during the maximum likelihood optimization. Default is 0.005.

'optimopts'

Structure that specifies control parameters for the iterative algorithm the function uses to compute maximum likelihood estimates. Create this structure with the function statset. Enter statset('factoran') to see the names and default values of the parameters that factoran accepts in the options structure. See the reference page for statset for more information about these options.

Examples Estimate and Plot Factor Loadings Load the sample data. load carbig

Define the variable matrix. X = [Acceleration Displacement Horsepower MPG Weight]; X = X(all(~isnan(X),2),:);

Estimate the factor loadings using a minimum mean squared error prediction for a factor analysis with two common factors. [Lambda,Psi,T,stats,F] = factoran(X,2,'scores','regression'); inv(T'*T); % Estimated correlation matrix of F, == eye(2) Lambda*Lambda' + diag(Psi); % Estimated correlation matrix Lambda*inv(T); % Unrotate the loadings F*T'; % Unrotate the factor scores

Create biplot of two factors. biplot(Lambda,'LineWidth',2,'MarkerSize',20)

32-1271

32

Functions

Estimate the factor loadings using the covariance (or correlation) matrix. [Lambda,Psi,T] = factoran(cov(X),2,'xtype','cov') Lambda = 5×2 -0.2432 0.8773 0.7618 -0.7978 0.9692

-0.8500 0.3871 0.5930 -0.2786 0.2129

Psi = 5×1 0.2184 0.0804 0.0680 0.2859 0.0152 T = 2×2 0.9476 0.3195

0.3195 -0.9476

% [Lambda,Psi,T] = factoran(corrcoef(X),2,'xtype','cov')

32-1272

factoran

Although the estimates are the same, the use of a covariance matrix rather than raw data doesn't let you request scores or significance level. Use promax rotation. [Lambda,Psi,T,stats,F] = factoran(X,2,'rotate','promax',... 'powerpm',4); inv(T'*T) % Estimated correlation of F, ans = 2×2 1.0000 -0.6391

-0.6391 1.0000

Lambda*inv(T'*T)*Lambda'+diag(Psi)

% no longer eye(2) % Estimated correlation of X

ans = 5×5 1.0000 -0.5424 -0.6893 0.4309 -0.4167

-0.5424 1.0000 0.8979 -0.8078 0.9328

-0.6893 0.8979 1.0000 -0.7730 0.8647

0.4309 -0.8078 -0.7730 1.0000 -0.8326

-0.4167 0.9328 0.8647 -0.8326 1.0000

Plot the unrotated variables with oblique axes superimposed. invT = inv(T); Lambda0 = Lambda*invT; figure() line([-invT(1,1) invT(1,1) NaN -invT(2,1) invT(2,1)], ... [-invT(1,2) invT(1,2) NaN -invT(2,2) invT(2,2)], ... 'Color','r','linewidth',2) grid on hold on biplot(Lambda0,'LineWidth',2,'MarkerSize',20) xlabel('Loadings for unrotated Factor 1') ylabel('Loadings for unrotated Factor 2')

32-1273

32

Functions

Plot the rotated variables against the oblique axes. figure() biplot(Lambda,'LineWidth',2,'MarkerSize',20)

32-1274

factoran

User-Defined Rotation Function Syntax for passing additional arguments to a user-defined rotation function: [Lambda,Psi,T] = ... factoran(X,2,'rotate',@myrotation,'userargs',1,'two');

More About Factor Analysis Model factoran computes the maximum likelihood estimate (MLE) of the factor loadings matrix Λ in the factor analysis model x = μ + Λf + e where x is a vector of observed variables, μ is a constant vector of means, Λ is a constant d-by-m matrix of factor loadings, f is a vector of independent, standardized common factors, and e is a vector of independent specific factors. x, μ, and e are of length d. f is of length m. Alternatively, the factor analysis model can be specified as cov(x) = ΛΛT + Ψ 32-1275

32

Functions

where Ψ = cov(e) is a d-by-d diagonal matrix of specific variances.

Tips Observed Data Variables The variables in the observed data matrix X must be linearly independent, i.e., cov(X) must have full rank, for maximum likelihood estimation to succeed. factoran reduces both raw data and a covariance matrix to a correlation matrix before performing the fit. factoran standardizes the observed data X to zero mean and unit variance before estimating the loadings lambda. This does not affect the model fit, because MLEs in this model are invariant to scale. However, lambda and psi are returned in terms of the standardized variables, i.e., lambda*lambda'+diag(psi) is an estimate of the correlation matrix of the original data X (although not after an oblique rotation). See “Estimate and Plot Factor Loadings” on page 32-1271 and “User-Defined Rotation Function” on page 32-1275. Heywood Case If elements of psi are equal to the value of the 'delta' parameter (i.e., they are essentially zero), the fit is known as a Heywood case, and interpretation of the resulting estimates is problematic. In particular, there can be multiple local maxima of the likelihood, each with different estimates of the loadings and the specific variances. Heywood cases can indicate overfitting (i.e., m is too large), but can also be the result of underfitting. Rotation of Factor Loadings and Scores Unless you explicitly specify no rotation using the 'rotate' parameter, factoran rotates the estimated factor loadings, lambda, and the factor scores, F. The output matrix T is used to rotate the loadings, i.e., lambda = lambda0*T, where lambda0 is the initial (unrotated) MLE of the loadings. T is an orthogonal matrix for orthogonal rotations, and the identity matrix for no rotation. The inverse of T is known as the primary axis rotation matrix, while T itself is related to the reference axis rotation matrix. For orthogonal rotations, the two are identical. factoran computes factor scores that have been rotated by inv(T'), i.e., F = F0 * inv(T'), where F0 contains the unrotated predictions. The estimated covariance of F is inv(T'*T), which, for orthogonal or no rotation, is the identity matrix. Rotation of factor loadings and scores is an attempt to create a more easily interpretable structure in the loadings matrix after maximum likelihood estimation.

References [1] Harman, H. H. Modern Factor Analysis. 3rd Ed. Chicago: University of Chicago Press, 1976. [2] Jöreskog, K. G. “Some Contributions to Maximum Likelihood Factor Analysis.” Psychometrika. Vol. 32, Issue 4, 1967, pp. 443–482. [3] Lawley, D. N., and A. E. Maxwell. Factor Analysis as a Statistical Method. 2nd Ed. New York: American Elsevier Publishing Co., 1971.

32-1276

factoran

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. pcacov and factoran do not work directly on tall arrays. Instead, use C = gather(cov(X)) to compute the covariance matrix of a tall array. Then, you can use pcacov or factoran on the inmemory covariance matrix. Alternatively, you can use pca directly on a tall array.

For more information, see “Tall Arrays for Out-of-Memory Data” (MATLAB).

See Also biplot | pca | pcacov | procrustes | rotatefactors | statset Topics “Perform Factor Analysis on Exam Grades” Introduced before R2006a

32-1277

32

Functions

fcdf F cumulative distribution function

Syntax p = fcdf(x,v1,v2) p = fcdf(x,v1,v2,'upper')

Description p = fcdf(x,v1,v2) computes the F cdf at each of the values in x using the corresponding numerator degrees of freedom v1 and denominator degrees of freedom v2. x, v1, and v2 can be vectors, matrices, or multidimensional arrays that are all the same size. A scalar input is expanded to a constant matrix with the same dimensions as the other inputs. v1 and v2 parameters must contain real positive values, and the values in x must lie on the interval [0 Inf]. p = fcdf(x,v1,v2,'upper') returns the complement of the F cdf at each value in x, using an algorithm that more accurately computes the extreme upper tail probabilities. The F cdf is

p = F(x ν1, ν2) =

∫ 0



Γ

(ν1 + ν2) 2 ν1 ν2 Γ 2 2

ν1 ν2

ν1 2

ν1 − 2 2 dt ν1 + ν2 ν1 2 t ν2

t 1+

The result, p, is the probability that a single observation from an F distribution with parameters ν1 and ν2 will fall in the interval [0 x].

Examples Compute F Distribution CDF The following illustrates a useful mathematical identity for the F distribution. nu1 = 1:5; nu2 = 6:10; x = 2:6; F1 = fcdf(x,nu1,nu2) F1 = 1×5 0.7930

0.8854

0.9481

F2 = 1 - fcdf(1./x,nu2,nu1) F2 = 1×5

32-1278

0.9788

0.9919

fcdf

0.7930

0.8854

0.9481

0.9788

0.9919

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also cdf | finv | fpdf | frnd | fstat Topics “F Distribution” on page B-45 Introduced before R2006a

32-1279

32

Functions

feval Package: Predict responses of generalized linear regression model using one input for each predictor

Syntax ypred = feval(mdl,Xnew1,Xnew2,...,Xnewn)

Description ypred = feval(mdl,Xnew1,Xnew2,...,Xnewn) returns the predicted response of mdl to the new input predictors Xnew1,Xnew2,...,Xnewn.

Examples Predict Response Values Create a generalized linear regression model, and plot its responses to a range of input data. Generate sample data using Poisson random numbers with two underlying predictors X(:,1) and X(:,2). rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);

Create a generalized linear regression model of Poisson data. mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson');

Generate a range of values for X(:,1) and X(:,2), and plot the predictions at the values. [Xtest1,Xtest2] = meshgrid(min(X(:,1)):.5:max(X(:,1)),min(X(:,2)):.5:max(X(:,2))); Z = feval(mdl,Xtest1,Xtest2); surf(Xtest1,Xtest2,Z)

32-1280

feval

Input Arguments mdl — Generalized linear regression model GeneralizedLinearModel object | CompactGeneralizedLinearModel object Generalized linear regression model, specified as a GeneralizedLinearModel object created using fitglm or stepwiseglm, or a CompactGeneralizedLinearModel object created using compact. Xnew1,Xnew2,...,Xnewn — New predictor input values vector | matrix | table | dataset array New predictor values, specified as a vector, matrix, table, or dataset array. • If you pass multiple inputs Xnew1,Xnew2,...,Xnewn and each includes observations for one predictor variable, then each input must be a vector. Each vector must have the same size. If you specify a predictor variable as a scalar, then feval expands the scalar argument into a constant vector of the same size as the other arguments. • If you pass a single input Xnew1, then Xnew1 must be a table, dataset array, or matrix. • If Xnew1 is a table or dataset array, it must contain predictors that have the same predictor names as in the PredictorNames property of mdl. • If Xnew1 is a matrix, it must have the same number of variables (columns) in the same order as the predictor input used to create mdl. Note that Xnew1 must also contain any predictor variables that are not used as predictors in the fitted model. Also, all variables used in creating 32-1281

32

Functions

mdl must be numeric. To treat numerical predictors as categorical, identify the predictors using the 'CategoricalVars' name-value pair argument when you create mdl. Data Types: single | double | table

Output Arguments ypred — Predicted response values numeric vector Predicted response values at Xnew1,Xnew2,...,Xnewn, returned as a numeric vector. For a binomial model, feval uses 1 as the BinomialSize parameter, so the values in ypred are predicted probabilities. To return the numbers of successes in the trials, use the predict function and specify the number of trials by using the 'BinomialSize' name-value pair argument. For a model with an offset, feval uses 0 as the offset value. To specify the offset value used when you fit a model, use the predict function and the 'Offset' name-value pair argument.

Tips • A regression object is, mathematically, a function that estimates the relationship between the response and predictors. The feval function enables an object to behave like a function in MATLAB. You can pass feval to another function that accepts a function input, such as fminsearch and integral. • feval can be simpler to use with a model created from a table or dataset array. When you have new predictor data, you can pass it to feval without creating a table or matrix.

Alternative Functionality • predict gives the same predictions as feval if you use the default values for the 'Offset' and 'BinomialSize' name-value pair arguments of predict. The prediction values can be different if you specify other values for these arguments. The predict function also returns confidence intervals on its predictions. Note that the predict function accepts a single input argument containing all predictor variables, rather than multiple input arguments with one input for each predictor variable. • random predicts responses with added noise.

See Also CompactGeneralizedLinearModel | GeneralizedLinearModel | predict | random Topics “feval” on page 12-24 “Generalized Linear Models” on page 12-9 Introduced in R2012a

32-1282

feval

feval Package: Predict responses of linear regression model using one input for each predictor

Syntax ypred = feval(mdl,Xnew1,Xnew2,...,Xnewn)

Description ypred = feval(mdl,Xnew1,Xnew2,...,Xnewn) returns the predicted response of mdl to the new input predictors Xnew1,Xnew2,...,Xnewn.

Examples Plot Different Categorical Levels Fit a mileage model to the carsmall data set, including the Year categorical predictor. Superimpose fitted curves on a scatter plot of the data. Load the data set and fit the model. load carsmall tbl = table(MPG,Weight); tbl.Year = categorical(Model_Year); mdl = fitlm(tbl,'MPG ~ Year + Weight^2');

Create a scatter plot of MPG versus Weight, grouped by Year. gscatter(tbl.Weight,tbl.MPG,tbl.Year);

32-1283

32

Functions

Plot curves of the model predictions for the various years and weights by using feval. w = linspace(min(tbl.Weight),max(tbl.Weight))'; line(w,feval(mdl,w,'70'),'Color','r') line(w,feval(mdl,w,'76'),'Color','g') line(w,feval(mdl,w,'82'),'Color','b')

32-1284

feval

Input Arguments mdl — Linear regression model object LinearModel object | CompactLinearModel object Linear regression model object, specified as a LinearModel object created by using fitlm or stepwiselm, or a CompactLinearModel object created by using compact. Xnew1,Xnew2,...,Xnewn — New predictor input values vector | matrix | table | dataset array New predictor values, specified as a vector, matrix, table, or dataset array. • If you pass multiple inputs Xnew1,Xnew2,...,Xnewn and each includes observations for one predictor variable, then each input must be a vector. Each vector must have the same size. If you specify a predictor variable as a scalar, then feval expands the scalar argument into a constant vector of the same size as the other arguments. • If you pass a single input Xnew1, then Xnew1 must be a table, dataset array, or matrix. • If Xnew1 is a table or dataset array, it must contain predictors that have the same predictor names as in the PredictorNames property of mdl. • If Xnew1 is a matrix, it must have the same number of variables (columns) in the same order as the predictor input used to create mdl. Note that Xnew1 must also contain any predictor variables that are not used as predictors in the fitted model. Also, all variables used in creating 32-1285

32

Functions

mdl must be numeric. To treat numerical predictors as categorical, identify the predictors using the 'CategoricalVars' name-value pair argument when you create mdl. Data Types: single | double | table

Output Arguments ypred — Predicted response values numeric vector Predicted response values at Xnew1,Xnew2,...,Xnewn, returned as a numeric vector.

Tips • A regression object is, mathematically, a function that estimates the relationship between the response and predictors. The feval function enables an object to behave like a function in MATLAB. You can pass feval to another function that accepts a function input, such as fminsearch and integral. • feval can be simpler to use with a model created from a table or dataset array. When you have new predictor data, you can pass it to feval without creating a table or matrix.

Alternative Functionality • predict gives the same predictions as feval by using a single input argument containing all predictor variables, rather than multiple input arguments with one input for each predictor variable. predict also gives confidence intervals on its predictions. • random predicts responses with added noise.

See Also CompactLinearModel | LinearModel | predict | random Topics “Predict or Simulate Responses to New Data” on page 11-30 “Linear Regression Workflow” on page 11-34 “Interpret Linear Regression Results” on page 11-49 “Linear Regression” on page 11-8 Introduced in R2012a

32-1286

feval

feval Class: NonLinearModel Evaluate nonlinear regression model prediction

Syntax ypred = feval(mdl,Xnew1,Xnew2,...,Xnewn)

Description ypred = feval(mdl,Xnew1,Xnew2,...,Xnewn) returns the predicted response of mdl to the input [Xnew1,Xnew2,...,Xnewn].

Input Arguments mdl Nonlinear regression model, constructed by fitnlm. Xnew1,Xnew2,...,Xnewn Predictor components. Xnewi can be one of: • Scalar • Vector • Array Each nonscalar component must have the same size (number of elements in each dimension). If you pass just one Xnew array, Xnew can be a table, dataset array, or an array of doubles, where each column of the array represents one predictor.

Output Arguments ypred Predicted mean values at Xnew. ypred is the same size as each component of Xnew.

Examples Predict a Nonlinear Model from a Table Create a nonlinear model for auto mileage based on the carbig data. Predict the mileage of an average automobile. Load the data and create a nonlinear model. 32-1287

32

Functions

load carbig tbl = table(Horsepower,Weight,MPG); modelfun = @(b,x)b(1) + b(2)*x(:,1).^b(3) + ... b(4)*x(:,2).^b(5); beta0 = [-50 500 -1 500 -1]; mdl = fitnlm(tbl,modelfun,beta0);

Find the predicted mileage of an average auto. The data contain some observations with NaN, so compute the mean using nanmean. Xnew = nanmean([Horsepower Weight]); MPGnew = feval(mdl,Xnew) MPGnew = 21.8073

Alternatives predict gives the same predictions, but uses a single input array with one observation in each row, rather than one component in each input argument. predict also gives confidence intervals on its predictions. random predicts with added noise.

See Also NonLinearModel | predict | random Topics “Predict or Simulate Responses Using a Nonlinear Model” on page 13-9 “Nonlinear Regression” on page 13-2

32-1288

ff2n

ff2n Two-level full factorial design

Syntax dFF2 = ff2n(n)

Description dFF2 = ff2n(n) gives factor settings dFF2 for a two-level full factorial design with n factors. dFF2 is m-by-n, where m is the number of treatments in the full-factorial design. Each row of dFF2 corresponds to a single treatment. Each column contains the settings for a single factor, with values of 0 and 1 for the two levels.

Examples dFF2 = ff2n(3) dFF2 = 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1

See Also fullfact Introduced before R2006a

32-1289

32

Functions

fillprox Class: TreeBagger Proximity matrix for training data

Syntax B = fillprox(B) B = fillprox(B,'param1',val1,'param2',val2,...)

Description B = fillprox(B) computes a proximity matrix for the training data and stores it in the Properties field of B. B = fillprox(B,'param1',val1,'param2',val2,...) specifies optional parameter name/ value pairs: 'Trees'

Either 'all' or a vector of indices of the trees in the ensemble to be used in computing the proximity matrix. Default is 'all'.

'NumPrint'

Number of training cycles (grown trees) after which TreeBagger displays a diagnostic message showing training progress. Default is no diagnostic messages.

See Also outlierMeasure | proximity

32-1290

findobj

findobj Class: qrandstream Find objects matching specified conditions

Syntax hm = findobj(h, 'conditions')

Description The findobj method of the handle class follows the same syntax as the MATLAB findobj command, except that the first argument must be an array of handles to objects. hm = findobj(h, 'conditions') searches the handle object array h and returns an array of handle objects matching the specified conditions. Only the public members of the objects of h are considered when evaluating the conditions.

See Also findobj | qrandstream

32-1291

32

Functions

findprop Class: qrandstream Find property of MATLAB handle object

Syntax p = findprop(h,'propname')

Description p = findprop(h,'propname') finds and returns the meta.property object associated with property name propname of scalar handle object h. propname must be a character vector. It can be the name of a property defined by the class of h or a dynamic property added to scalar object h. If no property named propname exists for object h, an empty meta.property array is returned.

See Also dynamicprops | findobj | meta.property | qrandstream

32-1292

finv

finv F inverse cumulative distribution function

Syntax X = finv(P,V1,V2)

Description X = finv(P,V1,V2) computes the inverse of the F cdf with numerator degrees of freedom V1 and denominator degrees of freedom V2 for the corresponding probabilities in P. P, V1, and V2 can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other inputs. V1 and V2 parameters must contain real positive values, and the values in P must lie on the interval [0 1]. The F inverse function is defined in terms of the F cdf as x = F−1 p ν1, ν2 = x: F x ν1, ν2 = p where xΓ

p = F(x ν1, ν2) =

∫Γ

0

(ν1 + ν2) 2 ν1 ν2 Γ 2 2

ν1 ν2

ν1 2

ν1 − 2 2 dt ν1 + ν2 ν1 2 t ν2

t 1+

Examples Find a value that should exceed 95% of the samples from an F distribution with 5 degrees of freedom in the numerator and 10 degrees of freedom in the denominator. x = finv(0.95,5,10) x = 3.3258

You would observe values greater than 3.3258 only 5% of the time by chance.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 32-1293

32

Functions

See Also fcdf | fpdf | frnd | fstat | icdf Topics “F Distribution” on page B-45 Introduced before R2006a

32-1294

fishertest

fishertest Fisher’s exact test

Syntax h = fishertest(x) [h,p,stats] = fishertest(x) [ ___ ] = fishertest(x,Name,Value)

Description h = fishertest(x) returns a test decision for Fisher’s exact test of the null hypothesis that there are no nonrandom associations between the two categorical variables in x, against the alternative that there is a nonrandom association. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, or 0 otherwise. [h,p,stats] = fishertest(x) also returns the significance level p of the test and a structure stats containing additional test results, including the odds ratio and its asymptotic confidence interval. [ ___ ] = fishertest(x,Name,Value) returns a test decision using additional options specified by one or more name-value pair arguments. For example, you can change the significance level of the test or conduct a one-sided test.

Examples Conduct Fisher's Exact Test In a small survey, a researcher asked 17 individuals if they received a flu shot this year, and whether they caught the flu this winter. The results indicate that, of the nine people who did not receive a flu shot, three got the flu and six did not. Of the eight people who received a flu shot, one got the flu and seven did not. Create a 2-by-2 contingency table containing the survey data. Row 1 contains data for the individuals who did not receive a flu shot, and row 2 contains data for the individuals who received a flu shot. Column 1 contains the number of individuals who got the flu, and column 2 contains the number of individuals who did not. x = table([3;1],[6;7],'VariableNames',{'Flu','NoFlu'},'RowNames',{'NoShot','Shot'}) x=2×2 table

NoShot Shot

Flu ___

NoFlu _____

3 1

6 7

Use Fisher's exact test to determine if there is a nonrandom association between receiving a flu shot and getting the flu. 32-1295

32

Functions

h = fishertest(x) h = logical 0

The returned test decision h = 0 indicates that fishertest does not reject the null hypothesis of no nonrandom association between the categorical variables at the default 5% significance level. Therefore, based on the test results, individuals who do not get a flu shot do not have different odds of getting the flu than those who got the flu shot.

Conduct a One-Sided Fisher's Exact Test In a small survey, a researcher asked 17 individuals if they received a flu shot this year, and whether they caught the flu. The results indicate that, of the nine people who did not receive a flu shot, three got the flu and six did not. Of the eight people who received a flu shot, one got the flu and seven did not. x = [3,6;1,7];

Use a right-tailed Fisher's exact test to determine if the odds of getting the flu is higher for individuals who did not receive a flu shot than for individuals who did. Conduct the test at the 1% significance level. [h,p,stats] = fishertest(x,'Tail','right','Alpha',0.01) h = logical 0 p = 0.3353 stats = struct with fields: OddsRatio: 3.5000 ConfidenceInterval: [0.1289 95.0408]

The returned test decision h = 0 indicates that fishertest does not reject the null hypothesis of no nonrandom association between the categorical variables at the 1% significance level. Since this is a right-tailed hypothesis test, the conclusion is that individuals who do not get a flu shot do not have greater odds of getting the flu than those who got the flu shot.

Generate a Contingency Table Using crosstab Load the hospital data. load hospital

The hospital dataset array contains data on 100 hospital patients, including last name, gender, age, weight, smoking status, and systolic and diastolic blood pressure measurements. 32-1296

fishertest

To determine if smoking status is independent of gender, use crosstab to create a 2-by-2 contingency table of smokers and nonsmokers, grouped by gender. [tbl,chi2,p,labels] = crosstab(hospital.Sex,hospital.Smoker) tbl = 2×2 40 26

13 21

chi2 = 4.5083 p = 0.0337 labels = 2x2 cell {'Female'} {'0'} {'Male' } {'1'}

The rows of the resulting contingency table tbl correspond to the patient's gender, with row 1 containing data for females and row 2 containing data for males. The columns correspond to the patient's smoking status, with column 1 containing data for nonsmokers and column 2 containing data for smokers. The returned result chi2 = 4.5083 is the value of the chi-squared test statistic for a chi-squared test of independence. The returned value p = 0.0337 is an approximate p-value based on the chi-squared distribution. Use the contingency table generated by crosstab to perform Fisher's exact test on the data. [h,p,stats] = fishertest(tbl) h = logical 1 p = 0.0375 stats = struct with fields: OddsRatio: 2.4852 ConfidenceInterval: [1.0624 5.8135]

The result h = 1 indicates that fishertest rejects the null hypothesis of nonassociation between smoking status and gender at the 5% significance level. In other words, there is an association between gender and smoking status. The odds ratio indicates that the male patients have about 2.5 times greater odds of being smokers than the female patients. The returned p-value of the test, p = 0.0375, is close to, but not exactly the same as, the result obtained by crosstab. This is because fishertest computes an exact p-value using the sample data, while crosstab uses a chi-squared approximation to compute the p-value.

Input Arguments x — Contingency table 2-by-2 matrix of nonnegative integer values | 2-by-2 table of nonnegative integer values 32-1297

32

Functions

Contingency table, specified as a 2-by-2 matrix or table containing nonnegative integer values. A contingency table contains the frequency distribution of the variables in the sample data. You can use crosstab to generate a contingency table from sample data. Example: [4,0;0,4] Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Alpha',0.01,'Tail','right' specifies a right-tailed hypothesis test at the 1% significance level. Alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level of the hypothesis test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1). Example: 'Alpha',0.01 Data Types: single | double Tail — Type of alternative hypothesis 'both' (default) | 'right' | 'left' Type of alternative hypothesis, specified as the comma-separated pair consisting of 'Tail' and one of the following. 'both'

Two-tailed test. The alternative hypothesis is that there is a nonrandom association between the two variables in x, and the odds ratio is not equal to 1.

'right'

Right-tailed test. The alternative hypothesis is that the odds ratio is greater than 1.

'left'

Left-tailed test. The alternative hypothesis is that the odds ratio is less than 1.

Example: 'Tail','right'

Output Arguments h — Hypothesis test result 1|0 Hypothesis test result, returned as a logical value. • If h is 1, then fishertest rejects the null hypothesis at the Alpha significance level. • If h is 0, then fishertest fails to reject the null hypothesis at the Alpha significance level. p — p-value scalar value in the range [0,1] 32-1298

fishertest

p-value of the test, returned as a scalar value in the range [0,1]. p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of p cast doubt on the validity of the null hypothesis. stats — Test data structure Test data, returned as a structure with the following fields: • OddsRatio — A measure of association between the two variables. • ConfidenceInterval — Asymptotic confidence interval for the odds ratio. If any of the cell frequencies in x are 0, then fishertest does not compute a confidence interval and instead displays [-Inf Inf].

More About Fisher’s Exact Test Fisher’s exact test is a nonparametric statistical test used to test the null hypothesis that no nonrandom associations exist between two categorical variables, against the alternative that there is a nonrandom association between the variables. Fisher’s exact test provides an alternative to the chi-squared test for small samples, or samples with very uneven marginal distributions. Unlike the chi-squared test, Fisher’s exact test does not depend on large-sample distribution assumptions, and instead calculates an exact p-value based on the sample data. Although Fisher’s exact test is valid for samples of any size, it is not recommended for large samples because it is computationally intensive. If all of the frequency counts in the contingency table are greater than or equal to 1e7, then fishertest errors. For contingency tables that contain large count values or are well-balanced, use crosstab or chi2gof instead. fishertest accepts a 2-by-2 contingency table as input, and computes the p-value of the test as follows: 1

Calculate the sums for each row, column, and total number of observations in the contingency table.

2

Using a multivariate generalization of the hypergeometric probability function, calculate the conditional probability of observing the exact result in the contingency table if the null hypothesis were true, given its row and column sums. The conditional probability is Pcutof f =

R1 !R2 ! C1 !C2 ! N! ∏i, j ni j !

,

where R1 and R2 are the row sums, C1 and C2 are the column sums, N is the total number of observations in the contingency table, and nij is the value in the ith row and jth column of the table. 3

Find all possible matrices of nonnegative integers consistent with the row and column sums. For each matrix, calculate the associated conditional probability using the equation for Pcutoff.

4

Use these values to calculate the p-value of the test, based on the alternative hypothesis of interest. • For a two-sided test, sum all of the conditional probabilities less than or equal to Pcutoff for the observed contingency table. This represents the probability of observing a result as extreme 32-1299

32

Functions

as, or more extreme than, the actual outcome if the null hypothesis were true. Small p-values cast doubt on the validity of the null hypothesis, in favor of the alternative hypothesis of association between the variables. • For a left-sided test, sum the conditional probabilities of all the matrices with a (1,1) cell frequency less than or equal to n11. • For a right-sided test, sum the conditional probabilities of all the matrices with a (1,1) cell frequency greater than or equal to n11 in the observed contingency table. The odds ratio is OR =

n11n22 . n21n12

The null hypothesis of conditional independence is equivalent to the hypothesis that the odds ratio equals 1. The left-sided alternative is equivalent to an odds ratio less than 1, and the right-sided alternative is equivalent to an odds ratio greater than 1. The asymptotic 100(1 – α)% confidence interval for the odds ratio is CI = exp L − Φ−1

1−α 1−α SE , exp L + Φ−1 SE , 2 2

where L is the log odds ratio, Φ-1( • ) is the inverse of the normal inverse cumulative distribution function, and SE is the standard error for the log odds ratio. If the 100(1 – α)% confidence interval does not contain the value 1, then the association is significant at the α significance level. If any of the four cell frequencies are 0, then fishertest does not compute the confidence interval and instead displays [-Inf Inf]. fishertest only accepts 2-by-2 contingency tables as input. To test the independence of categorical variables with more than two levels, use the chi-squared test provided by crosstab.

See Also chi2gof | crosstab Introduced in R2014b

32-1300

ClassificationDiscriminant.fit

ClassificationDiscriminant.fit Class: ClassificationDiscriminant (Not Recommended) Fit discriminant analysis classifier Note ClassificationDiscriminant.fit is not recommended. Use fitcdiscr instead.

Syntax obj = ClassificationDiscriminant.fit(x,y) obj = ClassificationDiscriminant.fit(x,y,Name,Value)

Description obj = ClassificationDiscriminant.fit(x,y) returns a discriminant analysis classifier based on the input variables (also known as predictors, features, or attributes) x and output (response) y. obj = ClassificationDiscriminant.fit(x,y,Name,Value) fits a classifier with additional options specified by one or more Name,Value pair arguments. If you use one of the following five options, obj is of class ClassificationPartitionedModel: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Otherwise, obj is of class ClassificationDiscriminant.

Input Arguments x — Predictor values matrix of numeric values Predictor values, specified as a matrix of numeric values. Each column of x represents one variable, and each row represents one observation. ClassificationDiscriminant.fit considers NaN values in x as missing values. ClassificationDiscriminant.fit does not use observations with missing values for x in the fit. Data Types: single | double y — Classification values numeric vector | categorical vector | logical vector | character array | cell array of character vectors Classification values, specified as a numeric vector, categorical vector (nominal or ordinal), logical vector, character array, or cell array of character vectors. Each row of y represents the classification of the corresponding row of x. ClassificationDiscriminant.fit considers NaN values in y to be missing values. ClassificationDiscriminant.fit does not use observations with missing values for y in the fit. Data Types: single | double | logical | char | cell

32-1301

32

Functions

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. ClassNames — Class names array Class names, specified as the comma-separated pair consisting of 'ClassNames' and an array. Use the data type that exists in y. The default is the class names that exist in y. Use ClassNames to order the classes or to select a subset of classes for training. Data Types: single | double | logical | char Cost — Cost of misclassification square matrix | structure Cost of misclassification, specified as the comma-separated pair consisting of 'Cost' and a square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. Alternatively, Cost can be a structure S having two fields: S.ClassNames containing the group names as a variable of the same type as y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. Data Types: single | double | struct CrossVal — Cross-validation flag 'off' (default) | 'on' Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software implements 10-fold cross-validation. To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, KFold, or Leaveout. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only. Alternatively, cross-validate later by passing Mdl to crossval. Example: 'CrossVal','on' CVPartition — Cross-validated model partition cvpartition object Cross-validated model partition, specified as the comma-separated pair consisting of 'CVPartition' and an object created using cvpartition. To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Delta — Linear coefficient threshold 0 (default) | nonnegative scalar value 32-1302

ClassificationDiscriminant.fit

Linear coefficient threshold, specified as the comma-separated pair consisting of 'Delta' and a nonnegative scalar value. If a coefficient of Mdl has magnitude smaller than Delta, Mdl sets this coefficient to 0, and you can eliminate the corresponding predictor from the model. Set Delta to a higher value to eliminate more predictors. Delta must be 0 for quadratic discriminant models. Data Types: single | double DiscrimType — Discriminant type 'linear' (default) | 'quadratic' | 'diaglinear' | 'diagquadratic' | 'pseudolinear' | 'pseudoquadratic' Discriminant type, specified as the comma-separated pair consisting of 'DiscrimType' and a character vector in this table. Value

Description

Predictor Covariance Treatment

'linear'

Regularized linear discriminant • All classes have the same analysis (LDA) covariance matrix. •

Σ γ = 1 − γ Σ + γdiag Σ .

Σ is the empirical, pooled covariance matrix and γ is the amount of regularization. 'diaglinear'

LDA

All classes have the same, diagonal covariance matrix.

'pseudolinear'

LDA

All classes have the same covariance matrix. The software inverts the covariance matrix using the pseudo inverse.

'quadratic'

Quadratic discriminant analysis The covariance matrices can (QDA) vary among classes.

'diagquadratic'

QDA

The covariance matrices are diagonal and can vary among classes.

'pseudoquadratic'

QDA

The covariance matrices can vary among classes. The software inverts the covariance matrix using the pseudo inverse.

Note To use regularization, you must specify 'linear'. To specify the amount of regularization, use the Gamma name-value pair argument. Example: 'DiscrimType','quadratic' FillCoeffs — Coeffs property flag 'on' | 'off' 32-1303

32

Functions

Coeffs property flag, specified as the comma-separated pair consisting of 'FillCoeffs' and 'on' or 'off'. Setting the flag to 'on' populates the Coeffs property in the classifier object. This can be computationally intensive, especially when cross-validating. The default is 'on', unless you specify a cross-validation name-value pair, in which case the flag is set to 'off' by default. Example: 'FillCoeffs','off' Gamma — Amount of regularization scalar value in the interval [0,1] Amount of regularization to apply when estimating the covariance matrix of the predictors, specified as the comma-separated pair consisting of 'Gamma' and a scalar value in the interval [0,1]. Gamma provides finer control over the covariance matrix structure than DiscrimType. • If you specify 0, then the software does not use regularization to adjust the covariance matrix. That is, the software estimates and uses the unrestricted, empirical covariance matrix. • For linear discriminant analysis, if the empirical covariance matrix is singular, then the software automatically applies the minimal regularization required to invert the covariance matrix. You can display the chosen regularization amount by entering Mdl.Gamma at the command line. • For quadratic discriminant analysis, if at least one class has an empirical covariance matrix that is singular, then the software throws an error. • If you specify a value in the interval (0,1), then you must implement linear discriminant analysis, otherwise the software throws an error. Consequently, the software sets DiscrimType to 'linear'. • If you specify 1, then the software uses maximum regularization for covariance matrix estimation. That is, the software restricts the covariance matrix to be diagonal. Alternatively, you can set DiscrimType to 'diagLinear' or 'diagQuadratic' for diagonal covariance matrices. Example: 'Gamma',1 Data Types: single | double Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 32-1304

ClassificationDiscriminant.fit

Number of folds to use in a cross-validated classifier, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify, e.g., 'KFold',k, then the software: 1

Randomly partitions the data into k sets

2

For each set, reserves the set as validation data, and trains the model using the other k – 1 sets

3

Stores the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',8 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations, where n is size(Mdl.X,1), the software: 1

Reserves the observation as validation data, and trains the model using the other n – 1 observations

2

Stores the n compact, trained models in the cells of a n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Example: 'Leaveout','on' PredictorNames — Predictor variable names {'X1','X2',...} (default) | cell array of character vectors Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a cell array of character vectors containing the names for the predictor variables, in the order in which they appear in X. Data Types: cell Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and one of the following. • 'empirical' determines class probabilities from class frequencies in y. If you pass observation weights, they are used to compute the class probabilities. • 'uniform' sets all class probabilities equal. • A vector containing one scalar value for each class. • A structure S with two fields: 32-1305

32

Functions

• S.ClassNames containing the class names as a variable of the same type as y • S.ClassProbs containing a vector of corresponding probabilities Example: 'Prior','uniform' Data Types: single | double | char | struct ResponseName — Response variable name 'Y' (default) | character vector Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector containing the name of the response variable y. Example: 'ResponseName','Response' Data Types: char SaveMemory — Flag to save covariance matrix 'off' (default) | 'on' Flag to save covariance matrix, specified as the comma-separated pair consisting of 'SaveMemory' and either 'on' or 'off'. If you specify 'on', then fitcdiscr does not store the full covariance matrix, but instead stores enough information to compute the matrix. The predict method computes the full covariance matrix for prediction, and does not store the matrix. If you specify 'off', then fitcdiscr computes and stores the full covariance matrix in Mdl. Specify SaveMemory as 'on' when the input matrix contains thousands of predictors. Example: 'SaveMemory','on' ScoreTransform — Score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ... Score transformation, specified as the comma-separated pair consisting of 'ScoreTransform' and a character vector or function handle. This table summarizes the available character vectors.

32-1306

Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1–x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

ClassificationDiscriminant.fit

Value

Description

'symmetriclogit'

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: 'ScoreTransform','logit' Data Types: char | function_handle Weights — Observation weights ones(size(X,1),1) (default) | vector of scalar values Observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values. The length of Weights is the number of rows in X. fitcdiscr normalizes the weights to sum to 1. Data Types: single | double

Output Arguments obj — Discriminant analysis classifier classifier object Discriminant analysis classifier, returned as a classifier object. Note that using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a tree of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method. Otherwise, obj is of class ClassificationDiscriminant, and you can use the predict method to predict the response of new data.

Examples Construct a Discriminant Analysis Classifier Load the sample data. load fisheriris

Construct a discriminant analysis classifier using the sample data. obj = ClassificationDiscriminant.fit(meas,species) obj = ClassificationDiscriminant ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' ScoreTransform: 'none' NumObservations: 150 DiscrimType: 'linear'

'versicolor'

'virginica'}

32-1307

32

Functions

Mu: [3x4 double] Coeffs: [3x3 struct] Properties, Methods

More About Discriminant Classification The model for discriminant analysis is: • Each class (Y) generates data (X) using a multivariate normal distribution. That is, the model assumes X has a Gaussian mixture distribution (gmdistribution). • For linear discriminant analysis, the model has the same covariance matrix for each class, only the means vary. • For quadratic discriminant analysis, both means and covariances of each class vary. predict classifies so as to minimize the expected classification cost: y = argmin

K



y = 1, ..., K k = 1

P k xCy k,

where •

y is the predicted classification.

• K is the number of classes. • P k x is the posterior probability on page 20-6 of class k for observation x. • C y k is the cost on page 20-7 of classifying an observation as y when its true class is k. For details, see “Prediction Using Discriminant Analysis Models” on page 20-6.

Alternatives The classify function also performs discriminant analysis. classify is usually more awkward to use: • classify requires you to fit the classifier every time you make a new prediction. • classify does not perform cross validation. • classify requires you to fit the classifier when changing prior probabilities.

See Also ClassificationDiscriminant | fitcdiscr Topics “Discriminant Analysis Classification” on page 20-2 32-1308

ClassificationKNN.fit

ClassificationKNN.fit (Not Recommended) Fit k-nearest neighbor classifier Note ClassificationKNN.fit is not recommended. Use fitcknn instead.

Syntax mdl = ClassificationKNN.fit(X,y) mdl = ClassificationKNN.fit(X,y,Name,Value)

Description mdl = ClassificationKNN.fit(X,y) returns a classification model based on the input variables (also known as predictors, features, or attributes) X and output (response) y. mdl = ClassificationKNN.fit(X,y,Name,Value) fits a model with additional options specified by one or more Name,Value pair arguments. If you use one of these options, mdl is of class ClassificationPartitionedModel: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Otherwise, mdl is of class ClassificationKNN.

Input Arguments X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each column of X represents one variable, and each row represents one observation. The length of Y and the number of observations in X must be equal. To specify the names of the predictors in the order of their appearance in X, use the PredictorNames name-value pair argument. Data Types: single | double Y — Classification values numeric vector | categorical vector | logical vector | character array | cell array of character vectors Classification values, specified as a numeric vector, categorical vector, logical vector, character array, or cell array of character vectors, with the same number of rows as X. Each row of Y represents the classification of the corresponding row of X. Data Types: single | double | cell | logical | char

32-1309

32

Functions

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. BreakTies — Tie-breaking algorithm 'smallest' (default) | 'nearest' | 'random' Tie-breaking algorithm used by the predict method if multiple classes have the same smallest cost, specified as the comma-separated pair consisting of 'BreakTies' and one of the following: • 'smallest' — Use the smallest index among tied groups. • 'nearest' — Use the class with the nearest neighbor among tied groups. • 'random' — Use a random tiebreaker among tied groups. By default, ties occur when multiple classes have the same number of nearest points among the K nearest neighbors. Example: 'BreakTies','nearest' BucketSize — Maximum data points in node 50 (default) | positive integer value Maximum number of data points in the leaf node of the kd-tree, specified as the comma-separated pair consisting of 'BucketSize' and a positive integer value. This argument is meaningful only when NSMethod is 'kdtree'. Example: 'BucketSize',40 Data Types: single | double CategoricalPredictors — Categorical predictor flag [] | 'all' Categorical predictor flag, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the following: • 'all' — All predictors are categorical. • [] — No predictors are categorical. The predictor data for ClassificationKNN.fit must be either all continuous or all categorical. • If the predictor data is in a table (Tbl), ClassificationKNN.fit assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If Tbl includes both continuous and categorical values, then you must specify the value of 'CategoricalPredictors' so that ClassificationKNN.fit can determine how to treat all predictors, as either continuous or categorical variables. • If the predictor data is a matrix (X), ClassificationKNN.fit assumes that all predictors are continuous. To identify all predictors in X as categorical, specify 'CategoricalPredictors' as 'all'. When you set CategoricalPredictors to 'all', the default Distance is 'hamming'. Example: 'CategoricalPredictors','all' 32-1310

ClassificationKNN.fit

ClassNames — Class names numeric vector | categorical vector | logical vector | character array | cell array of character vectors Class names, specified as the comma-separated pair consisting of 'ClassNames' and an array representing the class names. Use the same data type as the values that exist in y. Use ClassNames to order the classes or to select a subset of classes for training. The default is the class names in y. Data Types: single | double | char | logical | cell Cost — Cost of misclassification square matrix | structure Cost of misclassification of a point, specified as the comma-separated pair consisting of 'Cost' and one of the following: • Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). To specify the class order for the corresponding rows and columns of Cost, additionally specify the ClassNames name-value pair argument. • Structure S having two fields: S.ClassNames containing the group names as a variable of the same type as Y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. Data Types: single | double | struct Cov — Covariance matrix nancov(X) (default) | positive definite matrix of scalar values Covariance matrix, specified as the comma-separated pair consisting of 'Cov' and a positive definite matrix of scalar values representing the covariance matrix when computing the Mahalanobis distance. This argument is only valid when 'Distance' is 'mahalanobis'. You cannot simultaneously specify 'Standardize' and either of 'Scale' or 'Cov'. Data Types: single | double CrossVal — Cross-validation flag 'off' (default) | 'on' Cross-validation flag, specified as the comma-separated pair consisting of 'CrossVal' and either 'on' or 'off'. If 'on', fitcknn creates a cross-validated model with 10 folds. Use the 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' parameters to override this cross-validation setting. You can only use one cross-validation name-value pair argument at a time to create a cross-validated model. Alternatively, cross validate mdl later using the crossval method. Example: 'Crossval','on' CVPartition — Cross-validated model partition cvpartition object Cross-validated model partition, specified as the comma-separated pair consisting of 'CVPartition' and an object created using cvpartition. You can only use one of these four 32-1311

32

Functions

options at a time to create a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Distance — Distance metric 'cityblock' | 'chebychev' | 'correlation' | 'cosine' | 'euclidean' | 'hamming' | function handle | ... Distance metric, specified as the comma-separated pair consisting of 'Distance' and a valid distance metric name or function handle. The allowable distance metric names depend on your choice of a neighbor-searcher method (see NSMethod). NSMethod

Distance Metric Names

exhaustive

Any distance metric of ExhaustiveSearcher

kdtree

'cityblock', 'chebychev', 'euclidean', or 'minkowski'

This table includes valid distance metrics of ExhaustiveSearcher.

32-1312

Distance Metric Names

Description

'cityblock'

City block distance.

'chebychev'

Chebychev distance (maximum coordinate difference).

'correlation'

One minus the sample linear correlation between observations (treated as sequences of values).

'cosine'

One minus the cosine of the included angle between observations (treated as vectors).

'euclidean'

Euclidean distance.

'hamming'

Hamming distance, percentage of coordinates that differ.

'jaccard'

One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.

'mahalanobis'

Mahalanobis distance, computed using a positive definite covariance matrix C. The default value of C is the sample covariance matrix of X, as computed by nancov(X). To specify a different value for C, use the 'Cov' name-value pair argument.

'minkowski'

Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'Exponent' name-value pair argument.

'seuclidean'

Standardized Euclidean distance. Each coordinate difference between X and a query point is scaled, meaning divided by a scale value S. The default value of S is the standard deviation computed from X, S = nanstd(X). To specify another value for S, use the Scale name-value pair argument.

'spearman'

One minus the sample Spearman's rank correlation between observations (treated as sequences of values).

ClassificationKNN.fit

Distance Metric Names

Description

@distfun

Distance function handle. distfun has the form function D2 = distfun(ZI,ZJ) % calculation of distance ...

where • ZI is a 1-by-N vector containing one row of X or Y. • ZJ is an M2-by-N matrix containing multiple rows of X or Y. • D2 is an M2-by-1 vector of distances, and D2(k) is the distance between observations ZI and ZJ(k,:). If you specify CategoricalPredictors as 'all', then the default distance metric is 'hamming'. Otherwise, the default distance metric is 'euclidean'. For definitions, see “Distance Metrics” on page 18-11. Example: 'Distance','minkowski' Data Types: char | function_handle DistanceWeight — Distance weighting function 'equal' (default) | 'inverse' | 'squaredinverse' | function handle Distance weighting function, specified as the comma-separated pair consisting of 'DistanceWeight' and either a function handle or one of the values in this table. Value

Description

'equal'

No weighting

'inverse'

Weight is 1/distance

'squaredinverse'

Weight is 1/distance2

@fcn

fcn is a function that accepts a matrix of nonnegative distances, and returns a matrix the same size containing nonnegative distance weights. For example, 'squaredinverse' is equivalent to @(d)d.^(-2).

Example: 'DistanceWeight','inverse' Data Types: char | function_handle Exponent — Minkowski distance exponent 2 (default) | positive scalar value Minkowski distance exponent, specified as the comma-separated pair consisting of 'Exponent' and a positive scalar value. This argument is only valid when 'Distance' is 'minkowski'. Example: 'Exponent',3 Data Types: single | double Holdout — Fraction of data for holdout validation 0 (default) | scalar value in the range [0,1] 32-1313

32

Functions

Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range [0,1]. Holdout validation tests the specified fraction of the data, and uses the remaining data for training. If you use Holdout, you cannot use any of the 'CVPartition', 'KFold', or 'Leaveout' namevalue pair arguments. Example: 'Holdout',0.1 Data Types: single | double IncludeTies — Tie inclusion flag false (default) | true Tie inclusion flag, specified as the comma-separated pair consisting of 'IncludeTies' and a logical value indicating whether predict includes all the neighbors whose distance values are equal to the Kth smallest distance. If IncludeTies is true, predict includes all these neighbors. Otherwise, predict uses exactly K neighbors. Example: 'IncludeTies',true Data Types: logical KFold — Number of folds 10 (default) | positive integer value Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value. If you use 'KFold', you cannot use any of the 'CVPartition', 'Holdout', or 'Leaveout' namevalue pair arguments. Example: 'KFold',8 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and either 'on' or 'off'. Specify 'on' to use leave-one-out cross validation. If you use 'Leaveout', you cannot use any of the 'CVPartition', 'Holdout', or 'KFold' namevalue pair arguments. Example: 'Leaveout','on' NSMethod — Nearest neighbor search method 'kdtree' | 'exhaustive' Nearest neighbor search method, specified as the comma-separated pair consisting of 'NSMethod' and 'kdtree' or 'exhaustive'. • 'kdtree' — Creates and uses a kd-tree to find nearest neighbors. 'kdtree' is valid when the distance metric is one of the following: • 'euclidean' 32-1314

ClassificationKNN.fit

• 'cityblock' • 'minkowski' • 'chebychev' • 'exhaustive' — Uses the exhaustive search algorithm. When predicting the class of a new point xnew, the software computes the distance values from all points in X to xnew to find nearest neighbors. The default is 'kdtree' when X has 10 or fewer columns, X is not sparse, and the distance metric is a 'kdtree' type; otherwise, 'exhaustive'. Example: 'NSMethod','exhaustive' NumNeighbors — Number of nearest neighbors to find 1 (default) | positive integer value Number of nearest neighbors in X to find for classifying each point when predicting, specified as the comma-separated pair consisting of 'NumNeighbors' and a positive integer value. Example: 'NumNeighbors',3 Data Types: single | double PredictorNames — Predictor variable names {'x1','x2',...} (default) | cell array of character vectors Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a cell array of character vectors containing the names for the predictor variables, in the order in which they appear in X. If you specify the predictors as a table (tbl), then PredictorNames must be a subset of the variable names in tbl. In this case, the software uses only the variables in PredictorNames to train the model. If you use formula to specify the model, then you cannot use the PredictorNames namevalue pair. Example: 'PredictorNames',{'PedalWidth','PedalLength'} Data Types: cell Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and a value in this table. Value

Description

'empirical'

The class prior probabilities are the class relative frequencies in Y.

'uniform'

All class prior probabilities are equal to 1/K, where K is the number of classes.

32-1315

32

Functions

Value

Description

numeric vector

Each element is a class prior probability. Order the elements according to Mdl.ClassNames or specify the order using the ClassNames namevalue pair argument. The software normalizes the elements such that they sum to 1.

structure

A structure S with two fields: • S.ClassNames contains the class names as a variable of the same type as Y. • S.ClassProbs contains a vector of corresponding prior probabilities. The software normalizes the elements such that they sum to 1.

If you set values for both Weights and Prior, the weights are renormalized to add up to the value of the prior probability in the respective class. Example: 'Prior','uniform' Data Types: char | single | double | struct ResponseName — Response variable name 'Y' (default) | character vector Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector containing the name of the response variable Y. This name-value pair is not valid when using the ResponseVarName or formula input arguments. Example: 'ResponseName','Response' Data Types: char Scale — Distance scale nanstd(X) (default) | vector of nonnegative scalar values Distance scale, specified as the comma-separated pair consisting of 'Scale' and a vector containing nonnegative scalar values with length equal to the number of columns in X. Each coordinate difference between X and a query point is scaled by the corresponding element of Scale. This argument is only valid when 'Distance' is 'seuclidean'. You cannot simultaneously specify 'Standardize' and either of 'Scale' or 'Cov'. Data Types: single | double Weights — Observation weights ones(size(X,1),1) (default) | vector of scalar values Observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values. The software weighs the observations in each row of X or tbl with the corresponding value in Weights. The size of Weights must equal the number of rows of X or tbl. If you specify the input data as a table tbl, then Weights can be the name of a variable in tbl that contains a numeric vector. In this case, you must specify Weights as a character vector containing 32-1316

ClassificationKNN.fit

the variable name. For example, if the weights vector w is stored as tbl.w, then specify it as 'w'. Otherwise, the software treats all columns of tbl, including w, as predictors when training the model. The software normalizes the weights in each class to add up to the value of the prior probability of the class. Data Types: single | double

Output Arguments mdl — k-nearest neighbor classifier model ClassificationKNN object k-nearest neighbor classifier model, specified as a ClassificationKNN object.

Examples Train k-Nearest Neighbor Classifier Train a k-nearest neighbor classifier for Fisher's iris data, where k, the number of nearest neighbors in the predictors, is 5. Load Fisher's iris data. load fisheriris X = meas; Y = species;

X is a numeric matrix that contains four petal measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species. Train a 5-nearest neighbor classifier. Standardize the noncategorical predictor data. Mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1) Mdl = ClassificationKNN ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: Distance: NumNeighbors:

'Y' [] {'setosa' 'versicolor' 'none' 150 'euclidean' 5

'virginica'}

Properties, Methods

Mdl is a trained ClassificationKNN classifier, and some of its properties appear in the Command Window. To access the properties of Mdl, use dot notation. Mdl.ClassNames

32-1317

32

Functions

ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' } Mdl.Prior ans = 1×3 0.3333

0.3333

0.3333

Mdl.Prior contains the class prior probabilities, which you can specify using the 'Prior' namevalue pair argument in fitcknn. The order of the class prior probabilities corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data. You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively. Mdl.Prior = [0.5 0.2 0.3];

You can pass Mdl to predict to label new measurements or crossval to cross-validate the classifier.

Train a k-Nearest Neighbor Classifier Using the Minkowski Metric Load Fisher's iris data set. load fisheriris X = meas; Y = species;

X is a numeric matrix that contains four petal measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species. Train a 3-nearest neighbors classifier using the Minkowski metric. To use the Minkowski metric, you must use an exhaustive searcher. It is good practice to standardize noncategorical predictor data. Mdl = fitcknn(X,Y,'NumNeighbors',3,... 'NSMethod','exhaustive','Distance','minkowski',... 'Standardize',1);

Mdl is a ClassificationKNN classifier. You can examine the properties of Mdl by double-clicking Mdl in the Workspace window. This opens the Variable Editor.

32-1318

ClassificationKNN.fit

More About Prediction ClassificationKNN predicts the classification of a point xnew using a procedure equivalent to this: 1

Find the NumNeighbors points in the training set X that are nearest to xnew.

2

Find the NumNeighbors response values Y to those nearest points.

3

Assign the classification label ynew that has the largest posterior probability among the values in Y.

For details, see “Posterior Probability” on page 32-4144 in the predict documentation.

See Also ClassificationKNN | fitcknn | predict Topics “Construct KNN Classifier” on page 18-27 32-1319

32

Functions

“Modify KNN Classifier” on page 18-28 “Classification Using Nearest Neighbors” on page 18-11

32-1320

ClassificationTree.fit

ClassificationTree.fit Class: ClassificationTree (Not Recommended) Fit classification tree Note ClassificationTree.fit is not recommended. Use fitctree instead.

Syntax tree = ClassificationTree.fit(x,y) tree = ClassificationTree.fit(x,y,Name,Value)

Description tree = ClassificationTree.fit(x,y) returns a classification tree based on the input variables (also known as predictors, features, or attributes) x and output (response) y. tree is a binary tree, where each branching node is split based on the values of a column of x. tree = ClassificationTree.fit(x,y,Name,Value) fits a tree with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN. Note that using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a tree of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method. Otherwise, tree is of class ClassificationTree, and you can use the predict method to make predictions.

Input Arguments x — Predictor values matrix of floating point values Predictor values, specified as a matrix of floating point values. ClassificationTree.fit considers NaN values in x as missing values. ClassificationTree.fit does not use observations with all missing values for x in the fit. ClassificationTree.fit uses observations with some missing values for x to find splits on variables for which these observations have valid values. Data Types: single | double y — predictor values numeric vector | categorical vector | logical vector | character array | cell array of character vectors Predictor values, specified as a numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. 32-1321

32

Functions

Each row of y represents the classification of the corresponding row of x. For numeric y, consider using fitrtree instead. ClassificationTree.fit considers NaN values in y to be missing values. ClassificationTree.fit does not use observations with missing values for y in the fit. Data Types: single | double | char | logical | cell Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. AlgorithmForCategorical — Algorithm for best split on categorical predictor 'Exact' | 'PullLeft' | 'PCA' | 'OVAbyClass' Algorithm to find the best split on a categorical predictor with L levels for data with K ≥ 3 classes, specified as the comma-separated pair consisting of 'AlgorithmForCategorical' and one of the following. 'Exact'

Consider all 2L–1 – 1 combinations

'PullLeft'

Pull left by purity

'PCA'

Principal component-based partition

'OVAbyClass'

One versus all by class

ClassificationTree.fit selects the optimal subset of algorithms for each split using the known number of classes and levels of a categorical predictor. For K = 2 classes, ClassificationTree.fit always performs the exact search. Example: 'AlgorithmForCategorical','PCA' CategoricalPredictors — Categorical predictors list numeric or logical vector | cell array of character vectors | character matrix | 'all' Categorical predictors list, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the following. • A numeric vector with indices from 1 to p, where p is the number of columns of x. • A logical vector of length p, where a true entry means that the corresponding column of x is a categorical variable. • A cell array of character vectors, where each element in the array is the name of a predictor variable. The names must match entries in PredictorNames values. • A character matrix, where each row of the matrix is a name of a predictor variable. The names must match entries in PredictorNames values. Pad the names with extra blanks so each row of the character matrix has the same length. • 'all', meaning all predictors are categorical. Example: 'CategoricalPredictors','all' Data Types: single | double | char ClassNames — Class names numeric vector | categorical vector | logical vector | character array | cell array of character vectors 32-1322

ClassificationTree.fit

Class names, specified as the comma-separated pair consisting of 'ClassNames' and an array representing the class names. Use the same data type as the values that exist in y. Use ClassNames to order the classes or to select a subset of classes for training. The default is the class names that exist in y. Data Types: single | double | char | logical | cell Cost — Cost of misclassification square matrix | structure Cost of misclassification a point, specified as the comma-separated pair consisting of 'Cost' and one of the following. • Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. • Structure S having two fields: S.ClassNames containing the group names as a variable of the same type as y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j Data Types: single | double | struct CrossVal — Flag to grow cross-validated tree 'off' (default) | 'on' Flag to grow a cross-validated decision tree, specified as the comma-separated pair consisting of 'CrossVal' and either 'on' or 'off'. If 'on', ClassificationTree.fit grows a cross-validated decision tree with 10 folds. You can override this cross-validation setting using one of the 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' name-value pair arguments. Note that you can only use one of these four options ('KFold', 'Holdout', 'Leaveout', or 'CVPartition') at a time when creating a cross-validated tree. Alternatively, cross-validate tree later using the crossval method. Example: 'CrossVal','on' CVPartition — Partition for cross-validation tree cvpartition object Partition to use in a cross-validated tree, specified as the comma-separated pair consisting of 'CVPartition' and an object of the cvpartition class created using cvpartition. Note that if you use 'CVPartition', you cannot use any of the 'KFold', 'Holdout', or 'Leaveout' name-value pair arguments. Holdout — Fraction of data for holdout validation 1 (default) | scalar value in the range (0,1] Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range [0,1]. Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. Note that if you use 'Holdout', you cannot use any of the 'CVPartition', 'KFold', or 'Leaveout' name-value pair arguments. 32-1323

32

Functions

Example: 'Holdout',0.1 Data Types: single | double KFold — Number of folds 10 (default) | positive integer value Number of folds to use in a cross-validated tree, specified as the comma-separated pair consisting of 'KFold' and a positive integer value. Note that if you use 'KFold', you cannot use any of the 'CVPartition', 'Holdout', or 'Leaveout' name-value pair arguments. Example: 'KFold',8 Data Types: single | double Leaveout — Leave-one-out cross validation flag 'off' (default) | 'on' Leave-one-out cross validation flag, specified as the comma-separated pair consisting of 'Leaveout' and either 'on' or 'off'. Use leave-one-out cross validation by setting to 'on'. Note that if you use 'Leaveout', you cannot use any of the 'CVPartition', 'Holdout', or 'KFold' name-value pair arguments. Example: 'Leaveout','on' MaxCat — Maximum category levels 10 (default) | nonnegative scalar value Maximum category levels, specified as the comma-separated pair consisting of 'MaxCat' and a nonnegative scalar value. ClassificationTree.fit splits a categorical predictor using the exact search algorithm if the predictor has at most MaxCat levels in the split node. Otherwise, ClassificationTree.fit finds the best categorical split using one of the inexact algorithms. Note that passing a small value can lead to loss of accuracy and passing a large value can lead to long computation time and memory overload. Example: 'MaxCat',8 MergeLeaves — Leaf merge flag 'on' (default) | 'off' Leaf merge flag, specified as the comma-separated pair consisting of 'MergeLeaves' and either 'on' or 'off'. When 'on', ClassificationTree.fit merges leaves that originate from the same parent node, and that give a sum of risk values greater or equal to the risk associated with the parent node. When 'off', ClassificationTree.fit does not merge leaves. Example: 'MergeLeaves','off' MinLeaf — Minimum number of leaf node observations 1 (default) | positive integer value Minimum number of leaf node observations, specified as the comma-separated pair consisting of 'MinLeaf' and a positive integer value. Each leaf has at least MinLeaf observations per tree leaf. If you supply both MinParent and MinLeaf, ClassificationTree.fit uses the setting that gives larger leaves: MinParent=max(MinParent,2*MinLeaf). 32-1324

ClassificationTree.fit

Example: 'MinLeaf',3 Data Types: single | double MinParent — Minimum number of branch node observations 10 (default) | positive integer value Minimum number of branch node observations, specified as the comma-separated pair consisting of 'MinParent' and a positive integer value. Each branch node in the tree has at least MinParent observations. If you supply both MinParent and MinLeaf, ClassificationTree.fit uses the setting that gives larger leaves: MinParent=max(MinParent,2*MinLeaf). Example: 'MinParent',8 Data Types: single | double NVarToSample — Number of predictors for split 'all' | positive integer value Number of predictors to select at random for each split, specified as the comma-separated pair consisting of 'NVarToSample' and a positive integer value. You can also specify 'all' to use all available predictors. Example: 'NVarToSample',3 Data Types: single | double PredictorNames — Predictor variable names {'x1','x2',...} (default) | cell array of character vectors Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a cell array of character vectors containing the names for the predictor variables, in the order in which they appear in x. Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior'and one of the following. • 'empirical' determines class probabilities from class frequencies in y. If you pass observation weights, they are used to compute the class probabilities. • 'uniform' sets all class probabilities equal. • A vector (one scalar value for each class) • A structure S with two fields: • S.ClassNames containing the class names as a variable of the same type as y • S.ClassProbs containing a vector of corresponding probabilities If you set values for both weights and prior, the weights are renormalized to add up to the value of the prior probability in the respective class. Example: 'Prior','uniform' Prune — Pruning flag 'on' (default) | 'off' 32-1325

32

Functions

Pruning flag, specified as the comma-separated pair consisting of 'Prune' and either 'on' or 'off'. When 'on', ClassificationTree.fit grows the classification tree, and computes the optimal sequence of pruned subtrees. When 'off' ClassificationTree.fit grows the classification tree without pruning. Example: 'Prune','off' PruneCriterion — Pruning criterion 'error' (default) | 'impurity' Pruning criterion, specified as the comma-separated pair consisting of 'PruneCriterion' and either 'error' or 'impurity'. Example: 'PruneCriterion','impurity' ResponseName — Response variable name 'Y' (default) | character vector Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector representing the name of the response variable y. Example: 'ResponseName','Response' ScoreTransform — Score transform function 'none' | 'symmetric' | 'invlogit' | 'ismax' | function handle | ... Score transform function, specified as the comma-separated pair consisting of 'ScoreTransform' and a function handle for transforming scores. Your function should accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Alternatively, you can specify one of the following values representing a built-in transformation function. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

Example: 'ScoreTransform','logit' SplitCriterion — Split criterion 'gdi' (default) | 'twoing' | 'deviance' 32-1326

ClassificationTree.fit

Split criterion, specified as the comma-separated pair consisting of 'SplitCriterion' and 'gdi' (Gini's diversity index), 'twoing' for the twoing rule, or 'deviance' for maximum deviance reduction (also known as cross entropy). Example: 'SplitCriterion','deviance' Surrogate — Surrogate decision splits flag 'off' | 'on' | 'all' | positive integer value Surrogate decision splits flag, specified as the comma-separated pair consisting of 'Surrogate' and 'on', 'off', 'all', or a positive integer value. • When set to 'on', ClassificationTree.fit finds at most 10 surrogate splits at each branch node. • When set to a positive integer value, ClassificationTree.fit finds at most the specified number of surrogate splits at each branch node. • When set to 'all', ClassificationTree.fit finds all surrogate splits at each branch node. The 'all' setting can use much time and memory. Use surrogate splits to improve the accuracy of predictions for data with missing values. The setting also enables you to compute measures of predictive association between predictors. Example: 'Surrogate','on' Weights — Observation weights ones(size(x,1),1) (default) | vector of scalar values Vector of observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values. The length of Weights is equal to the number of rows in x. ClassificationTree.fit normalizes the weights in each class to add up to the value of the prior probability of the class. Data Types: single | double

Output Arguments tree — Classification tree classification tree object Classification tree object, returned as a classification tree object. Note that using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a tree of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method. Instead, use kfoldPredict to predict responses for observations not used for training. Otherwise, tree is of class ClassificationTree, and you can use the predict method to make predictions.

Examples Construct a Classification Tree Construct a classification tree for the data in ionosphere.mat. 32-1327

32

Functions

load ionosphere tc = ClassificationTree.fit(X,Y) tc = ClassificationTree ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations:

'Y' [] {'b' 'g'} 'none' 351

Properties, Methods

More About Impurity and Node Error ClassificationTree splits nodes based on either impurity or node error. Impurity means one of several things, depending on your choice of the SplitCriterion name-value pair argument: • Gini's Diversity Index (gdi) — The Gini index of a node is 1 − ∑ p2(i), i

where the sum is over the classes i at the node, and p(i) is the observed fraction of classes with class i that reach the node. A node with just one class (a pure node) has Gini index 0; otherwise the Gini index is positive. So the Gini index is a measure of node impurity. • Deviance ('deviance') — With p(i) defined the same as for the Gini index, the deviance of a node is − ∑ p(i)log2p(i) . i

A pure node has deviance 0; otherwise, the deviance is positive. • Twoing rule ('twoing') — Twoing is not a purity measure of a node, but is a different measure for deciding how to split a node. Let L(i) denote the fraction of members of class i in the left child node after a split, and R(i) denote the fraction of members of class i in the right child node after a split. Choose the split criterion to maximize P(L)P(R)



L(i) − R(i)

2

,

i

where P(L) and P(R) are the fractions of observations that split to the left and right respectively. If the expression is large, the split made each child node purer. Similarly, if the expression is small, the split made each child node similar to each other, and therefore similar to the parent node. The split did not increase node purity. • Node error — The node error is the fraction of misclassified classes at a node. If j is the class with the largest number of training samples at a node, the node error is 32-1328

ClassificationTree.fit

1



p(j).

References [1] Coppersmith, D., S. J. Hong, and J. R. M. Hosking. “Partitioning Nominal Attributes in Decision Trees.” Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217. [2] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.

See Also ClassificationTree | fitctree | kfoldPredict | predict

32-1329

32

Functions

GeneralizedLinearModel.fit (Not Recommended) Create generalized linear regression model Note GeneralizedLinearModel.fit is not recommended. Use fitglm instead.

Syntax mdl mdl mdl mdl mdl

= = = = =

GeneralizedLinearModel.fit(tbl) GeneralizedLinearModel.fit(X,y) GeneralizedLinearModel.fit(...,modelspec) GeneralizedLinearModel.fit(...,Name,Value) GeneralizedLinearModel.fit(...,modelspec,Name,Value)

Description mdl = GeneralizedLinearModel.fit(tbl) creates a generalized linear model of a table or dataset array tbl. mdl = GeneralizedLinearModel.fit(X,y) creates a generalized linear model of the responses y to a data matrix X. mdl = GeneralizedLinearModel.fit(...,modelspec) creates a generalized linear model as specified by modelspec. mdl = GeneralizedLinearModel.fit(...,Name,Value) or mdl = GeneralizedLinearModel.fit(...,modelspec,Name,Value) creates a generalized linear model with additional options specified by one or more Name,Value pair arguments.

Input Arguments tbl — Input data table | dataset array Input data including predictor and response variables, specified as a table or dataset array. The predictor variables and response variable can be numeric, logical, categorical, character, or string. The response variable can have a data type other than numeric only if 'Distribution' is 'binomial'. • By default, GeneralizedLinearModel.fit takes the last variable as the response variable and the others as the predictor variables. • To set a different column as the response variable, use the ResponseVar name-value pair argument. • To use a subset of the columns as predictors, use the PredictorVars name-value pair argument. • To define a model specification, set the modelspec argument using a formula or terms matrix. The formula or terms matrix specifies which columns to use as the predictor or response variables. 32-1330

GeneralizedLinearModel.fit

The variable names in a table do not have to be valid MATLAB identifiers. However, if the names are not valid, you cannot use a formula when you fit or adjust a model; for example: • You cannot specify modelspec using a formula. • You cannot use a formula to specify the terms to add or remove when you use the addTerms function or the removeTerms function, respectively. • You cannot use a formula to specify the lower and upper bounds of the model when you use the step or stepwiseglm function with the name-value pair arguments 'Lower' and 'Upper', respectively. You can verify the variable names in tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,tbl.Properties.VariableNames)

If the variable names in tbl are not valid, then convert them by using the matlab.lang.makeValidName function. tbl.Properties.VariableNames = matlab.lang.makeValidName(tbl.Properties.VariableNames);

X — Predictor variables matrix Predictor variables, specified as an n-by-p matrix, where n is the number of observations and p is the number of predictor variables. Each column of X represents one variable, and each row represents one observation. By default, there is a constant term in the model, unless you explicitly remove it, so do not include a column of 1s in X. Data Types: single | double y — Response variable vector | matrix Response variable, specified as a vector or matrix. • If 'Distribution' is not 'binomial', then y must be an n-by-1 vector, where n is the number of observations. Each entry in y is the response for the corresponding row of X. The data type must be single or double. • If 'Distribution' is 'binomial', then y can be an n-by-1 vector or n-by-2 matrix with counts in column 1 and BinomialSize in column 2. Data Types: single | double | logical | categorical modelspec — Model specification 'linear' (default) | character vector or string scalar naming the model | t-by-(p + 1) terms matrix | character vector or string scalar formula in the form 'Y ~ terms' Model specification, specified as one of the following: • Character vector or string scalar specifying the type of model.

32-1331

32

Functions

Value

Model Type

'constant'

Model contains only a constant (intercept) term.

'linear'

Model contains an intercept and linear term for each predictor.

'interactions'

Model contains an intercept, linear term for each predictor, and all products of pairs of distinct predictors (no squared terms).

'purequadratic'

Model contains an intercept term and linear and squared terms for each predictor.

'quadratic'

Model contains an intercept term, linear and squared terms for each predictor, and all products of pairs of distinct predictors.

'polyijk'

Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor, and so on. Specify the maximum degree for each predictor by using numerals 0 though 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example, 'poly13' has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second predictors, respectively.

• t-by-(p+1) matrix, namely terms matrix on page 32-1338, specifying terms to include in model, where t is the number of terms and p is the number of predictor variables, and plus one is for the response variable. • Character vector or string scalar representing a formula on page 32-1338 in the form 'Y

~

terms',

where the terms are in “Wilkinson Notation” on page 32-1338. Example: 'quadratic' Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. BinomialSize — Number of trials for binomial distribution 1 (default) | numeric scalar | numeric vector | character vector | string scalar Number of trials for binomial distribution, that is the sample size, specified as the comma-separated pair consisting of 'BinomialSize' and the variable name in tbl, a numeric scalar, or a numeric vector of the same length as the response. This is the parameter n for the fitted binomial distribution. BinomialSize applies only when the Distribution parameter is 'binomial'. If BinomialSize is a scalar value, that means all observations have the same number of trials. As an alternative to BinomialSize, you can specify the response as a two-column matrix with counts in column 1 and BinomialSize in column 2. 32-1332

GeneralizedLinearModel.fit

Data Types: single | double | char | string CategoricalVars — Categorical variable list string array | cell array of character vectors | logical or numeric index vector Categorical variable list, specified as the comma-separated pair consisting of 'CategoricalVars' and either a string array or cell array of character vectors containing categorical variable names in the table or dataset array tbl, or a logical or numeric index vector indicating which columns are categorical. • If data is in a table or dataset array tbl, then, by default, GeneralizedLinearModel.fit treats all categorical values, logical values, character arrays, string arrays, and cell arrays of character vectors as categorical variables. • If data is in matrix X, then the default value of 'CategoricalVars' is an empty matrix []. That is, no variable is categorical unless you specify it as categorical. For example, you can specify the observations 2 and 3 out of 6 as categorical using either of the following examples. Example: 'CategoricalVars',[2,3] Example: 'CategoricalVars',logical([0 1 1 0 0 0]) Data Types: single | double | logical | string | cell DispersionFlag — Indicator to compute dispersion parameter false for 'binomial' and 'poisson' distributions (default) | true Indicator to compute dispersion parameter for 'binomial' and 'poisson' distributions, specified as the comma-separated pair consisting of 'DispersionFlag' and one of the following. true

Estimate a dispersion parameter when computing standard errors. The estimated dispersion parameter value is the sum of squared Pearson residuals divided by the degrees of freedom for error (DFE).

false

Default. Use the theoretical value when computing standard errors.

The fitting function always estimates the dispersion for other distributions. Example: 'DispersionFlag',true Distribution — Distribution of the response variable 'normal' (default) | 'binomial' | 'poisson' | 'gamma' | 'inverse gaussian' Distribution of the response variable, specified as the comma-separated pair consisting of 'Distribution' and one of the following. 'normal'

Normal distribution

'binomial'

Binomial distribution

'poisson'

Poisson distribution

'gamma'

Gamma distribution

'inverse gaussian'

Inverse Gaussian distribution

Example: 'Distribution','gamma' 32-1333

32

Functions

Exclude — Observations to exclude logical or numeric index vector Observations to exclude from the fit, specified as the comma-separated pair consisting of 'Exclude' and a logical or numeric index vector indicating which observations to exclude from the fit. For example, you can exclude observations 2 and 3 out of 6 using either of the following examples. Example: 'Exclude',[2,3] Example: 'Exclude',logical([0 1 1 0 0 0]) Data Types: single | double | logical Intercept — Indicator for constant term true (default) | false Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model. Use 'Intercept' only when specifying the model using a character vector or string scalar, not a formula or matrix. Example: 'Intercept',false Link — Link function canonical link function (default) | scalar value | structure Link function to use in place of the canonical link function, specified as the comma-separated pair consisting of 'Link' and one of the following. Link Function Name

Link Function

Mean (Inverse) Function

'identity'

f(μ) = μ

μ = Xb

'log'

f(μ) = log(μ)

μ = exp(Xb)

'logit'

f(μ) = log(μ/(1–μ))

μ = exp(Xb) / (1 + exp(Xb))

–1

'probit'

f(μ) = Φ (μ)

μ = Φ(Xb)

'comploglog'

f(μ) = log(–log(1 – μ))

μ = 1 – exp(–exp(Xb))

'reciprocal'

f(μ) = 1/μ

μ = 1/(Xb)

p (a number)

32-1334

p

f(μ) = μ

μ = Xb1/p

GeneralizedLinearModel.fit

Link Function Name

Link Function

Mean (Inverse) Function

S (a structure) with three fields. Each field holds a function handle that accepts a vector of inputs and returns a vector of the same size:

f(μ) = S.Link(μ)

μ = S.Inverse(Xb)

• S.Link — The link function • S.Inverse — The inverse link function • S.Derivative — The derivative of the link function The link function defines the relationship f(μ) = X*b between the mean response μ and the linear combination of predictors X*b. For more information on the canonical link functions, see “Canonical Link Function” on page 321339. Example: 'Link','probit' Data Types: char | string | single | double | struct Offset — Offset variable [ ] (default) | numeric vector | character vector | string scalar Offset variable in the fit, specified as the comma-separated pair consisting of 'Offset' and the variable name in tbl or a numeric vector with the same length as the response. GeneralizedLinearModel.fit uses Offset as an additional predictor with a coefficient value fixed at 1. In other words, the formula for fitting is f(μ) = Offset + X*b, where f is the link function, μ is the mean response, and X*b is the linear combination of predictors X. The Offset predictor has coefficient 1. For example, consider a Poisson regression model. Suppose the number of counts is known for theoretical reasons to be proportional to a predictor A. By using the log link function and by specifying log(A) as an offset, you can force the model to satisfy this theoretical constraint. Data Types: single | double | char | string PredictorVars — Predictor variables string array | cell array of character vectors | logical or numeric index vector Predictor variables to use in the fit, specified as the comma-separated pair consisting of 'PredictorVars' and either a string array or cell array of character vectors of the variable names in the table or dataset array tbl, or a logical or numeric index vector indicating which columns are predictor variables. The string values or character vectors should be among the names in tbl, or the names you specify using the 'VarNames' name-value pair argument. The default is all variables in X, or all variables in tbl except for ResponseVar. 32-1335

32

Functions

For example, you can specify the second and third variables as the predictor variables using either of the following examples. Example: 'PredictorVars',[2,3] Example: 'PredictorVars',logical([0 1 1 0 0 0]) Data Types: single | double | logical | string | cell ResponseVar — Response variable last column in tbl (default) | character vector or string scalar containing variable name | logical or numeric index vector Response variable to use in the fit, specified as the comma-separated pair consisting of 'ResponseVar' and either a character vector or string scalar containing the variable name in the table or dataset array tbl, or a logical or numeric index vector indicating which column is the response variable. You typically need to use 'ResponseVar' when fitting a table or dataset array tbl. For example, you can specify the fourth variable, say yield, as the response out of six variables, in one of the following ways. Example: 'ResponseVar','yield' Example: 'ResponseVar',[4] Example: 'ResponseVar',logical([0 0 0 1 0 0]) Data Types: single | double | logical | char | string VarNames — Names of variables {'x1','x2',...,'xn','y'} (default) | string array | cell array of character vectors Names of variables, specified as the comma-separated pair consisting of 'VarNames' and a string array or cell array of character vectors including the names for the columns of X first, and the name for the response variable y last. 'VarNames' is not applicable to variables in a table or dataset array, because those variables already have names. The variable names do not have to be valid MATLAB identifiers. However, if the names are not valid, you cannot use a formula when you fit or adjust a model; for example: • You cannot use a formula to specify the terms to add or remove when you use the addTerms function or the removeTerms function, respectively. • You cannot use a formula to specify the lower and upper bounds of the model when you use the step or stepwiseglm function with the name-value pair arguments 'Lower' and 'Upper', respectively. Before specifying 'VarNames',varNames, you can verify the variable names in varNames by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,varNames)

If the variable names in varNames are not valid, then convert them by using the matlab.lang.makeValidName function. 32-1336

GeneralizedLinearModel.fit

varNames = matlab.lang.makeValidName(varNames);

Example: 'VarNames',{'Horsepower','Acceleration','Model_Year','MPG'} Data Types: string | cell Weights — Observation weights ones(n,1) (default) | n-by-1 vector of nonnegative scalar values Observation weights, specified as the comma-separated pair consisting of 'Weights' and an n-by-1 vector of nonnegative scalar values, where n is the number of observations. Data Types: single | double

Output Arguments mdl — Generalized linear regression model GeneralizedLinearModel object Generalized linear regression model, specified as a GeneralizedLinearModel object created using fitglm or stepwiseglm.

Examples Create Generalized Linear Regression Model Fit a logistic regression model of the probability of smoking as a function of age, weight, and sex, using a two-way interaction model. Load the hospital data set. load hospital

Convert the dataset array to a table. tbl = dataset2table(hospital);

Specify the model using a formula that includes two-way interactions and lower-order terms. modelspec = 'Smoker ~ Age*Weight*Sex - Age:Weight:Sex';

Create the generalized linear model. mdl = fitglm(tbl,modelspec,'Distribution','binomial') mdl = Generalized linear regression model: logit(Smoker) ~ 1 + Sex*Age + Sex*Weight + Age*Weight Distribution = Binomial Estimated Coefficients:

(Intercept) Sex_Male

Estimate ___________

SE _________

tStat ________

pValue _______

-6.0492 -2.2859

19.749 12.424

-0.3063 -0.18399

0.75938 0.85402

32-1337

32

Functions

Age Weight Sex_Male:Age Sex_Male:Weight Age:Weight

0.11691 0.031109 0.020734 0.01216 -0.00071959

0.50977 0.15208 0.20681 0.053168 0.0038964

0.22934 0.20455 0.10025 0.22871 -0.18468

0.81861 0.83792 0.92014 0.8191 0.85348

100 observations, 93 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 5.07, p-value = 0.535

The large p-value indicates that the model might not differ statistically from a constant.

More About Terms Matrix A terms matrix T is a t-by-(p + 1) matrix specifying terms in a model, where t is the number of terms, p is the number of predictor variables, and +1 accounts for the response variable. The value of T(i,j) is the exponent of variable j in term i. For example, suppose that an input includes three predictor variables A, B, and C and the response variable Y in the order A, B, C, and Y. Each row of T represents one term: • [0 0 0 0] — Constant term or intercept • [0 1 0 0] — B; equivalently, A^0 * B^1 * C^0 • [1 0 1 0] — A*C • [2 0 0 0] — A^2 • [0 1 2 0] — B*(C^2) The 0 at the end of each term represents the response variable. In general, a column vector of zeros in a terms matrix represents the position of the response variable. If you have the predictor and response variables in a matrix and column vector, then you must include 0 for the response variable in the last column of each row. Formula A formula for model specification is a character vector or string scalar of the form 'Y ~ terms'. • Y is the response name. • terms represents the predictor terms in a model using Wilkinson notation. For example: • 'Y ~ A + B + C' specifies a three-variable linear model with intercept. • 'Y ~ A + B + C – 1' specifies a three-variable linear model without intercept. Note that formulas include a constant (intercept) term by default. To exclude a constant term from the model, you must include –1 in the formula. Wilkinson Notation Wilkinson notation describes the terms present in a model. The notation relates to the terms present in a model, not to the multipliers (coefficients) of those terms. 32-1338

GeneralizedLinearModel.fit

Wilkinson notation uses these symbols: • + means include the next variable. • – means do not include the next variable. • : defines an interaction, which is a product of terms. • * defines an interaction and all lower-order terms. • ^ raises the predictor to a power, exactly as in * repeated, so ^ includes lower-order terms as well. • () groups terms. This table shows typical examples of Wilkinson notation. Wilkinson Notation

Term in Standard Notation

1

Constant (intercept) term

A^k, where k is a positive integer

A, A2, ..., Ak

A + B

A, B

A*B

A, B, A*B

A:B

A*B only

–B

Do not include B

A*B + C

A, B, C, A*B

A + B + C + A:B

A, B, C, A*B

A*B*C – A:B:C

A, B, C, A*B, A*C, B*C

A*(B + C)

A, B, C, A*B, A*C

Statistics and Machine Learning Toolbox notation always includes a constant term unless you explicitly remove the term using –1. For more details, see “Wilkinson Notation” on page 11-90. Canonical Link Function The default link function for a generalized linear model is the canonical link function. Distribution

Canonical Link Function Name

Link Function

Mean (Inverse) Function

'normal'

'identity'

f(μ) = μ

μ = Xb

'binomial'

'logit'

f(μ) = log(μ/(1 – μ))

μ = exp(Xb) / (1 + exp(Xb))

'poisson'

'log'

f(μ) = log(μ)

μ = exp(Xb)

'gamma'

-1

f(μ) = 1/μ

μ = 1/(Xb)

'inverse gaussian' -2

f(μ) = 1/μ

2

μ = (Xb)–1/2

Tips • The generalized linear model mdl is a standard linear model unless you specify otherwise with the Distribution name-value pair. 32-1339

32

Functions

• For other methods such as devianceTest, or properties of the GeneralizedLinearModel object, see GeneralizedLinearModel.

Alternatives You can also construct a generalized linear model using fitglm. Use stepwiseglm to select a model specification automatically. Use step, addTerms, or removeTerms to adjust a fitted model.

References [1] Collett, D. Modeling Binary Data. New York: Chapman & Hall, 2002. [2] Dobson, A. J. An Introduction to Generalized Linear Models. New York: Chapman & Hall, 1990. [3] McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990.

See Also GeneralizedLinearModel | fitglm | stepwiseglm Topics “Generalized Linear Model Workflow” on page 12-28 “Generalized Linear Models” on page 12-9

32-1340

gmdistribution.fit

gmdistribution.fit (Not Recommended) Gaussian mixture parameter estimates Note gmdistribution.fit is not recommended. Use fitgmdist instead.

Syntax obj = gmdistribution.fit(X,k) obj = gmdistribution.fit(...,param1,val1,param2,val2,...)

Description obj = gmdistribution.fit(X,k) uses an Expectation Maximization (EM) algorithm to construct an object obj of the gmdistribution class containing maximum likelihood estimates of the parameters in a Gaussian mixture model with k components for data in the n-by-m matrix X, where n is the number of observations and m is the dimension of the data. gmdistribution treats NaN values as missing data. Rows of X with NaN values are excluded from the fit. obj = gmdistribution.fit(...,param1,val1,param2,val2,...) provides control over the iterative EM algorithm. Parameters and values are listed below. Parameter

Value

'Start'

Method used to choose initial component parameters. One of the following: • 'randSample' — To select k observations from X at random as initial component means. The mixing proportions are uniform. The initial covariance matrices for all components are diagonal, where the element j on the diagonal is the variance of X(:,j). This is the default. • 'plus' — The software selects k observations from X using the k-means+ + algorithm. The initial mixing proportions are uniform. The initial covariance matrices for all components are diagonal, where the element j on the diagonal is the variance of X(:,j). • S — A structure array with fields mu, Sigma, and PComponents. See gmdistribution for descriptions of values. • s — A vector of length n containing an initial guess of the component index for each point.

'Replicates'

A positive integer giving the number of times to repeat the EM algorithm, each time with a new set of parameters. The solution with the largest likelihood is returned. A value larger than 1 requires the 'randSample' start method. The default is 1.

'CovType'

'diagonal' if the covariance matrices are restricted to be diagonal; 'full' otherwise. The default is 'full'.

32-1341

32

Functions

Parameter

Value

'SharedCov'

Logical true if all the covariance matrices are restricted to be the same (pooled estimate); logical false otherwise.

'Regularize'

A nonnegative regularization number added to the diagonal of covariance matrices to make them positive-definite. The default is 0.

'Options'

Options structure for the iterative EM algorithm, as created by statset. gmdistribution.fit uses the parameters 'Display' with a default value of 'off', 'MaxIter' with a default value of 100, and 'TolFun' with a default value of 1e-6.

In some cases, gmdistribution may converge to a solution where one or more of the components has an ill-conditioned or singular covariance matrix. The following issues may result in an ill-conditioned covariance matrix: • The number of dimension of your data is relatively high and there are not enough observations. • Some of the features (variables) of your data are highly correlated. • Some or all the features are discrete. • You tried to fit the data to too many components. In general, you can avoid getting ill-conditioned covariance matrices by using one of the following precautions: • Pre-process your data to remove correlated features. • Set 'SharedCov' to true to use an equal covariance matrix for every component. • Set 'CovType' to 'diagonal'. • Use 'Regularize' to add a very small positive number to the diagonal of every covariance matrix. • Try another set of initial values. In other cases gmdistribution may pass through an intermediate step where one or more of the components has an ill-conditioned covariance matrix. Trying another set of initial values may avoid this issue without altering your data or model.

Examples Generate data from a mixture of two bivariate Gaussian distributions using the mvnrnd function: MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [-3 -5]; SIGMA2 = [1 0; 0 1]; X = [mvnrnd(MU1,SIGMA1,1000);mvnrnd(MU2,SIGMA2,1000)]; scatter(X(:,1),X(:,2),10,'.') hold on

32-1342

gmdistribution.fit

Next, fit a two-component Gaussian mixture model: options = statset('Display','final'); obj = gmdistribution.fit(X,2,'Options',options); 10 iterations, log-likelihood = -7046.78 h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);

32-1343

32

Functions

Among the properties of the fit are the parameter estimates: ComponentMeans = obj.mu ComponentMeans = 0.9391 2.0322 -2.9823 -4.9737 ComponentCovariances = obj.Sigma ComponentCovariances(:,:,1) = 1.7786 -0.0528 -0.0528 0.5312 ComponentCovariances(:,:,2) = 1.0491 -0.0150 -0.0150 0.9816 MixtureProportions = obj.PComponents MixtureProportions = 0.5000 0.5000

The Akaike information is minimized by the two-component model: AIC = zeros(1,4); obj = cell(1,4); for k = 1:4 obj{k} = gmdistribution.fit(X,k); AIC(k)= obj{k}.AIC; end [minAIC,numComponents] = min(AIC);

32-1344

gmdistribution.fit

numComponents numComponents = 2 model = obj{2} model = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: 0.9391 2.0322 Component 2: Mixing proportion: 0.500000 Mean: -2.9823 -4.9737

Both the Akaike and Bayes information are negative log-likelihoods for the data with penalty terms for the number of estimated parameters. They are often used to determine an appropriate number of components for a model when the number of components is unspecified.

References [1] McLachlan, G., and D. Peel. Finite Mixture Models. Hoboken, NJ: John Wiley & Sons, Inc., 2000.

See Also cluster | fitgmdist | gmdistribution

32-1345

32

Functions

LinearModel.fit (Not Recommended) Create linear regression model Note LinearModel.fit is not recommended. Use fitlm instead.

Syntax mdl mdl mdl mdl mdl

= = = = =

LinearModel.fit(tbl) LinearModel.fit(X,y) LinearModel.fit( ___ ,modelspec) LinearModel.fit( ___ ,Name,Value) LinearModel.fit( ___ ,modelspec,Name,Value)

Description mdl = LinearModel.fit(tbl) creates a linear model of a table or dataset array tbl. mdl = LinearModel.fit(X,y) creates a linear model of the responses y to a data matrix X. mdl = LinearModel.fit( ___ ,modelspec) creates a linear model of the type specified by modelspec, using any of the previous syntaxes. mdl = LinearModel.fit( ___ ,Name,Value) or mdl = LinearModel.fit( ___ ,modelspec, Name,Value) creates a linear model with additional options specified by one or more Name,Value pair arguments. For example, you can specify which predictor variables to include in the fit or include observation weights.

Input Arguments tbl — Input data table | dataset array Input data including predictor and response variables, specified as a table or dataset array. The predictor variables can be numeric, logical, categorical, character, or string. The response variable must be numeric or logical. • By default, LinearModel.fit takes the last variable as the response variable and the others as the predictor variables. • To set a different column as the response variable, use the ResponseVar name-value pair argument. • To use a subset of the columns as predictors, use the PredictorVars name-value pair argument. • To define a model specification, set the modelspec argument using a formula or terms matrix. The formula or terms matrix specifies which columns to use as the predictor or response variables. The variable names in a table do not have to be valid MATLAB identifiers. However, if the names are not valid, you cannot use a formula when you fit or adjust a model; for example: 32-1346

LinearModel.fit

• You cannot specify modelspec using a formula. • You cannot use a formula to specify the terms to add or remove when you use the addTerms function or the removeTerms function, respectively. • You cannot use a formula to specify the lower and upper bounds of the model when you use the step or stepwiselm function with the name-value pair arguments 'Lower' and 'Upper', respectively. You can verify the variable names in tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,tbl.Properties.VariableNames)

If the variable names in tbl are not valid, then convert them by using the matlab.lang.makeValidName function. tbl.Properties.VariableNames = matlab.lang.makeValidName(tbl.Properties.VariableNames);

X — Predictor variables matrix Predictor variables, specified as an n-by-p matrix, where n is the number of observations and p is the number of predictor variables. Each column of X represents one variable, and each row represents one observation. By default, there is a constant term in the model, unless you explicitly remove it, so do not include a column of 1s in X. Data Types: single | double y — Response variable vector Response variable, specified as an n-by-1 vector, where n is the number of observations. Each entry in y is the response for the corresponding row of X. Data Types: single | double | logical modelspec — Model specification 'linear' (default) | character vector or string scalar naming the model | t-by-(p + 1) terms matrix | character vector or string scalar formula in the form 'Y ~ terms' Model specification, specified as one of the following. • A character vector or string scalar naming the model. Value

Model Type

'constant'

Model contains only a constant (intercept) term.

'linear'

Model contains an intercept and linear term for each predictor.

'interactions'

Model contains an intercept, linear term for each predictor, and all products of pairs of distinct predictors (no squared terms).

32-1347

32

Functions

Value

Model Type

'purequadratic'

Model contains an intercept term and linear and squared terms for each predictor.

'quadratic'

Model contains an intercept term, linear and squared terms for each predictor, and all products of pairs of distinct predictors.

'polyijk'

Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor, and so on. Specify the maximum degree for each predictor by using numerals 0 though 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example, 'poly13' has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second predictors, respectively.

• t-by-(p + 1) matrix, namely terms matrix on page 32-1358, specifying terms to include in the model, where t is the number of terms and p is the number of predictor variables, and plus 1 is for the response variable. • A character vector or string scalar representing a formula on page 32-1359 in the form 'Y

~

terms',

where the terms are specified using Wilkinson Notation on page 32-1359. Example: 'quadratic' Example: 'y ~ X1 + X2^2 + X1:X2' Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. CategoricalVars — Categorical variable list string array | cell array of character vectors | logical or numeric index vector Categorical variable list, specified as the comma-separated pair consisting of 'CategoricalVars' and either a string array or cell array of character vectors containing categorical variable names in the table or dataset array tbl, or a logical or numeric index vector indicating which columns are categorical. • If data is in a table or dataset array tbl, then, by default, LinearModel.fit treats all categorical values, logical values, character arrays, string arrays, and cell arrays of character vectors as categorical variables. • If data is in matrix X, then the default value of 'CategoricalVars' is an empty matrix []. That is, no variable is categorical unless you specify it as categorical. For example, you can specify the observations 2 and 3 out of 6 as categorical using either of the following examples. Example: 'CategoricalVars',[2,3] 32-1348

LinearModel.fit

Example: 'CategoricalVars',logical([0 1 1 0 0 0]) Data Types: single | double | logical | string | cell Exclude — Observations to exclude logical or numeric index vector Observations to exclude from the fit, specified as the comma-separated pair consisting of 'Exclude' and a logical or numeric index vector indicating which observations to exclude from the fit. For example, you can exclude observations 2 and 3 out of 6 using either of the following examples. Example: 'Exclude',[2,3] Example: 'Exclude',logical([0 1 1 0 0 0]) Data Types: single | double | logical Intercept — Indicator for constant term true (default) | false Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model. Use 'Intercept' only when specifying the model using a character vector or string scalar, not a formula or matrix. Example: 'Intercept',false PredictorVars — Predictor variables string array | cell array of character vectors | logical or numeric index vector Predictor variables to use in the fit, specified as the comma-separated pair consisting of 'PredictorVars' and either a string array or cell array of character vectors of the variable names in the table or dataset array tbl, or a logical or numeric index vector indicating which columns are predictor variables. The string values or character vectors should be among the names in tbl, or the names you specify using the 'VarNames' name-value pair argument. The default is all variables in X, or all variables in tbl except for ResponseVar. For example, you can specify the second and third variables as the predictor variables using either of the following examples. Example: 'PredictorVars',[2,3] Example: 'PredictorVars',logical([0 1 1 0 0 0]) Data Types: single | double | logical | string | cell ResponseVar — Response variable last column in tbl (default) | character vector or string scalar containing variable name | logical or numeric index vector Response variable to use in the fit, specified as the comma-separated pair consisting of 'ResponseVar' and either a character vector or string scalar containing the variable name in the table or dataset array tbl, or a logical or numeric index vector indicating which column is the 32-1349

32

Functions

response variable. You typically need to use 'ResponseVar' when fitting a table or dataset array tbl. For example, you can specify the fourth variable, say yield, as the response out of six variables, in one of the following ways. Example: 'ResponseVar','yield' Example: 'ResponseVar',[4] Example: 'ResponseVar',logical([0 0 0 1 0 0]) Data Types: single | double | logical | char | string RobustOpts — Indicator of robust fitting type 'off' (default) | 'on' | character vector | string scalar | structure Indicator of the robust fitting type to use, specified as the comma-separated pair consisting of 'RobustOpts' and one of these values. • 'off' — No robust fitting. LinearModel.fit uses ordinary least squares. • 'on' — Robust fitting using the 'bisquare' weight function with the default tuning constant. • Character vector or string scalar — Name of a robust fitting weight function from the following table. LinearModel.fit uses the corresponding default tuning constant specified in the table. • Structure with the two fields RobustWgtFun and Tune. • The RobustWgtFun field contains the name of a robust fitting weight function from the following table or a function handle of a custom weight function. • The Tune field contains a tuning constant. If you do not set the Tune field, LinearModel.fit uses the corresponding default tuning constant. Weight Function

Description

Default Tuning Constant

'andrews'

w = (abs(r) 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

32-1587

32

Functions

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: 'ScoreTransform','logit' Data Types: char | string | function_handle Standardize — Flag to standardize predictors false (default) | true Flag to standardize the predictors, specified as the comma-separated pair consisting of 'Standardize' and true (1) or false (0). If you set 'Standardize',true, then the software centers and scales each column of the predictor data (X) by the column mean and standard deviation, respectively. The software does not standardize categorical predictors, and throws an error if all predictors are categorical. You cannot simultaneously specify 'Standardize',1 and either of 'Scale' or 'Cov'. It is good practice to standardize the predictor data. Example: 'Standardize',true Data Types: logical Weights — Observation weights numeric vector of positive values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values or name of a variable in Tbl. The software weighs the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows of X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response when training the model. The software normalizes Weights to sum up to the value of the prior probability in the respective class. By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. Data Types: double | single | char | string Cross Validation Options

CrossVal — Cross-validation flag 'off' (default) | 'on' Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software implements 10-fold cross-validation. 32-1588

fitcknn

To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, KFold, or Leaveout. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only. Alternatively, cross-validate later by passing Mdl to crossval. Example: 'CrossVal','on' CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets. To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',5 32-1589

32

Functions

Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Leaveout','on' Hyperparameter Optimization Options

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: • 'none' — Do not optimize. • 'auto' — Use {'Distance','NumNeighbors'}. • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names. • Vector of optimizableVariable objects, typically the output of hyperparameters. The optimization attempts to minimize the cross-validation loss (error) for fitcknn by varying the parameters. For information about cross-validation loss (albeit in a different context), see “Classification Loss” on page 32-2756. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note 'OptimizeHyperparameters' values override any values you set using other name-value pair arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes the 'auto' values to apply. The eligible parameters for fitcknn are: • Distance — fitcknn searches among 'cityblock', 'chebychev', 'correlation', 'cosine', 'euclidean', 'hamming', 'jaccard', 'mahalanobis', 'minkowski', 'seuclidean', and 'spearman'. • DistanceWeight — fitcknn searches among 'equal', 'inverse', and 'squaredinverse'. • Exponent — fitcknn searches among positive real values, by default in the range [0.5,3]. 32-1590

fitcknn

• NumNeighbors — fitcknn searches among positive integer values, by default log-scaled in the range [1, max(2,round(NumObservations/2))]. • Standardize — fitcknn searches among the values 'true' and 'false'. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example, load fisheriris params = hyperparameters('fitcknn',meas,species); params(1).Range = [1,20];

Pass params as the value of OptimizeHyperparameters. By default, iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is log(1 + cross-validation loss) for regression and the misclassification rate for classification. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value pair argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value pair argument. For an example, see “Optimize Fitted KNN Classifier” on page 32-1576. Example: 'auto' HyperparameterOptimizationOptions — Options for optimization structure Options for optimization, specified as the comma-separated pair consisting of 'HyperparameterOptimizationOptions' and a structure. This argument modifies the effect of the OptimizeHyperparameters name-value pair argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults).

32-1591

32

Functions

Field Name

Values

AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

Default 'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' or 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real. The time limit is in seconds, as measured by tic and toc. Run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables. ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best objective function value against the iteration number. If there are one or two optimization parameters, and if Optimizer is 'bayesopt', then ShowPlots also plots a model of the objective function against the parameters.

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

32-1592

false

fitcknn

Field Name

Values

Default

Verbose

Display to the command line.

1

• 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue pair argument. UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If false, the optimizer uses a single partition for the optimization. true usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three field names. CVPartition

A cvpartition object, as created by cvpartition.

Holdout

A scalar in the range (0,1) representing the holdout fraction.

Kfold

An integer greater than 1.

'Kfold',5 if you do not specify any crossvalidation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments Mdl — Trained k-nearest neighbor classification model ClassificationKNN model object | ClassificationPartitionedModel cross-validated model object Trained k-nearest neighbor classification model, returned as a ClassificationKNN model object or a ClassificationPartitionedModel cross-validated model object. If you set any of the name-value pair arguments KFold, Holdout, CrossVal, or CVPartition, then Mdl is a ClassificationPartitionedModel cross-validated model object. Otherwise, Mdl is a ClassificationKNN model object. 32-1593

32

Functions

To reference properties of Mdl, use dot notation. For example, to display the distance metric at the Command Window, enter Mdl.Distance.

More About Prediction ClassificationKNN predicts the classification of a point xnew using a procedure equivalent to this: 1

Find the NumNeighbors points in the training set X that are nearest to xnew.

2

Find the NumNeighbors response values Y to those nearest points.

3

Assign the classification label ynew that has the largest posterior probability among the values in Y.

For details, see “Posterior Probability” on page 32-4144 in the predict documentation.

Tips After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 312.

Algorithms • NaNs or s indicate missing observations. The following describes the behavior of fitcknn when the data set or weights contain missing observations. • If any value of Y or any weight is missing, then fitcknn removes those values from Y, the weights, and the corresponding rows of X from the data. The software renormalizes the weights to sum to 1. • If you specify to standardize predictors ('Standardize',1) or the standardized Euclidean distance ('Distance','seuclidean') without a scale, then fitcknn removes missing observations from individual predictors before computing the mean and standard deviation. In other words, the software implements nanmean and nanstd on each predictor. • If you specify the Mahalanobis distance ('Distance','mahalanbois') without its covariance matrix, then fitcknn removes rows of X that contain at least one missing value. In other words, the software implements nancov on the predictor matrix X. • Suppose that you set 'Standardize',1. • If you also specify Prior or Weights, then the software takes the observation weights into account. Specifically, the weighted mean of predictor j is xj =



wkx jk

Bj

and the weighted standard deviation is sj =

∑ wk(x jk − x j), Bj

where Bj is the set of indices k for which xjk and wk are not missing. 32-1594

fitcknn

• If you also set 'Distance','mahalanobis' or 'Distance','seuclidean', then you cannot specify Scale or Cov. Instead, the software: 1

Computes the means and standard deviations of each predictor

2

Standardizes the data using the results of step 1

3

Computes the distance parameter values using their respective default.

• If you specify Scale and either of Prior or Weights, then the software scales observed distances by the weighted standard deviations. • If you specify Cov and either of Prior or Weights, then the software applies the weighted covariance matrix to the distances. In other words,

∑wj Cov =

∑wj

B 2

B

−∑ B

w2j



wj xj − x ′ xj − x ,

B

where B is the set of indices j for which the observation xj does not have any missing values and wj is not missing.

Alternatives Although fitcknn can train a multiclass KNN classifier, you can reduce a multiclass learning problem to a series of KNN binary learners using fitcecoc.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. To perform parallel hyperparameter optimization, use the 'HyperparameterOptions', struct('UseParallel',true) name-value pair argument in the call to this function. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationKNN | ClassificationPartitionedModel | fitcecoc | fitcensemble | predict | templateKNN Topics “Construct KNN Classifier” on page 18-27 “Modify KNN Classifier” on page 18-28 “Classification Using Nearest Neighbors” on page 18-11 Introduced in R2014a 32-1595

32

Functions

fitclinear Fit linear classification model to high-dimensional data

Syntax Mdl = fitclinear(X,Y) Mdl = fitclinear(X,Y,Name,Value) [Mdl,FitInfo] = fitclinear( ___ ) [Mdl,FitInfo,HyperparameterOptimizationResults] = fitclinear( ___ )

Description fitclinear trains linear classification models for two-class (binary) learning with high-dimensional, full or sparse predictor data. Available linear classification models include regularized support vector machines (SVM) and logistic regression models. fitclinear minimizes the objective function using techniques that reduce computing time (e.g., stochastic gradient descent). For reduced computation time on a high-dimensional data set that includes many predictor variables, train a linear classification model by using fitclinear. For low- through medium-dimensional predictor data sets, see “Alternatives for Lower-Dimensional Data” on page 32-1625. To train a linear classification model for multiclass learning by combining SVM or logistic regression binary classifiers using error-correcting output codes, see fitcecoc. Mdl = fitclinear(X,Y) returns a trained linear classification model object that contains the results of fitting a binary support vector machine to the predictors X and class labels Y. Mdl = fitclinear(X,Y,Name,Value) returns a trained linear classification model with additional options specified by one or more Name,Value pair arguments. For example, you can specify that the columns of the predictor matrix correspond to observations, implement logistic regression, or specify to cross-validate. It is good practice to cross-validate using the Kfold Name,Value pair argument. The cross-validation results determine how well the model generalizes. [Mdl,FitInfo] = fitclinear( ___ ) also returns optimization details using any of the previous syntaxes. You cannot request FitInfo for cross-validated models. [Mdl,FitInfo,HyperparameterOptimizationResults] = fitclinear( ___ ) also returns hyperparameter optimization details when you pass an OptimizeHyperparameters name-value pair.

Examples Train Linear Classification Model Train a binary, linear classification model using support vector machines, dual SGD, and ridge regularization. Load the NLP data set. 32-1596

fitclinear

load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Train a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. Train the model using the entire data set. Determine how well the optimization algorithm fit the model to the data by extracting a fit summary. rng(1); % For reproducibility [Mdl,FitInfo] = fitclinear(X,Ystats) Mdl = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'none' Beta: [34023x1 double] Bias: -1.0059 Lambda: 3.1674e-05 Learner: 'svm' Properties, Methods FitInfo = struct with fields: Lambda: 3.1674e-05 Objective: 5.3783e-04 PassLimit: 10 NumPasses: 10 BatchLimit: [] NumIterations: 238561 GradientNorm: NaN GradientTolerance: 0 RelativeChangeInBeta: 0.0562 BetaTolerance: 1.0000e-04 DeltaGradient: 1.4582 DeltaGradientTolerance: 1 TerminationCode: 0 TerminationStatus: {'Iteration limit exceeded.'} Alpha: [31572x1 double] History: [] FitTime: 0.2391 Solver: {'dual'}

Mdl is a ClassificationLinear model. You can pass Mdl and the training or new data to loss to inspect the in-sample classification error. Or, you can pass Mdl and new predictor data to predict to predict class labels for new observations. FitInfo is a structure array containing, among other things, the termination status (TerminationStatus) and how long the solver took to fit the model to the data (FitTime). It is 32-1597

32

Functions

good practice to use FitInfo to determine whether optimization-termination measurements are satisfactory. Because training time is small, you can try to retrain the model, but increase the number of passes through the data. This can improve measures like DeltaGradient.

Find Good Lasso Penalty Using Cross-Validation To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, implement 5-fold cross-validation. Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats'; −6

Create a set of 11 logarithmically-spaced regularization strengths from 10

−0 . 5

through 10

.

Lambda = logspace(-6,-0.5,11);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Estimate the coefficients using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8. X = X'; rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns','KFold',5,... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8) CVMdl = classreg.learning.partition.ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1×1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods

numCLModels = numel(CVMdl.Trained) numCLModels = 5

32-1598

fitclinear

CVMdl is a ClassificationPartitionedLinear model. Because fitclinear implements 5-fold cross-validation, CVMdl contains 5 ClassificationLinear models that the software trains on each fold. Display the first trained linear classification model. Mdl1 = CVMdl.Trained{1}

Mdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'logit' Beta: [34023×11 double] Bias: [-13.2904 -13.2904 -13.2904 -13.2904 -9.9357 -7.0782 -5.4335 -4.5473 -3.4223 Lambda: [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 1.5849e-04 5.6234e-04 0.0020 0.0 Learner: 'logistic' Properties, Methods

Mdl1 is a ClassificationLinear model object. fitclinear constructed Mdl1 by training on the first four folds. Because Lambda is a sequence of regularization strengths, you can think of Mdl1 as 11 models, one for each regularization strength in Lambda. Estimate the cross-validated classification error. ce = kfoldLoss(CVMdl);

Because there are 11 regularization strengths, ce is a 1-by-11 vector of classification error rates. Higher values of Lambda lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model. Mdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the cross-validated, classification error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale. figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(ce),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} classification error') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') title('Test-Sample Statistics') hold off

32-1599

32

Functions

Choose the index of the regularization strength that balances predictor variable sparsity and low −4

classification error. In this case, a value between 10

−1

to 10

should suffice.

idxFinal = 7;

Select the model from Mdl with the chosen regularization strength. MdlFinal = selectModels(Mdl,idxFinal);

MdlFinal is a ClassificationLinear model containing one regularization strength. To estimate labels for new observations, pass MdlFinal and the new data to predict.

Optimize Linear Classifier This example shows how to minimize the cross-validation error in a linear classifier using fitclinear. The example uses the NLP data set. Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. 32-1600

fitclinear

The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. Identify the relevant labels. X = X'; Ystats = Y == 'stats';

Optimize the classification using the 'auto' parameters. For reproducibility, set the random seed and use the 'expected-improvement-plus' acquisition function. rng default Mdl = fitclinear(X,Ystats,'ObservationsIn','columns','Solver','sparsa',... 'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions',... struct('AcquisitionFunctionName','expected-improvement-plus'))

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | Lea | | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 1 | Best | 0.041619 | 4.0257 | 0.041619 | 0.041619 | 0.077903 | logi |

2 | Best

|

0.00072849 |

3.8245 |

0.00072849 |

0.0028767 |

2.1405e-09 |

logi

|

3 | Accept |

0.049221 |

4.6777 |

0.00072849 |

0.00075737 |

0.72101 |

|

4 | Accept |

0.00079184 |

4.299 |

0.00072849 |

0.00074989 |

3.4734e-07 |

|

5 | Accept |

0.00082351 |

3.9102 |

0.00072849 |

0.00072924 |

1.1738e-08 |

|

6 | Accept |

0.00085519 |

4.2132 |

0.00072849 |

0.00072746 |

2.4529e-09 |

|

7 | Accept |

0.00079184 |

4.1282 |

0.00072849 |

0.00072518 |

3.1854e-08 |

|

8 | Accept |

0.00088686 |

4.4564 |

0.00072849 |

0.00072236 |

3.1717e-10 |

|

9 | Accept |

0.00076017 |

3.7918 |

0.00072849 |

0.00068304 |

3.1837e-10 |

|

10 | Accept |

0.00079184 |

4.4733 |

0.00072849 |

0.00072853 |

1.1258e-07 |

|

11 | Accept |

0.00076017 |

3.9953 |

0.00072849 |

0.00072144 |

2.1214e-09 |

logi

|

12 | Accept |

0.00079184 |

6.7875 |

0.00072849 |

0.00075984 |

2.2819e-07 |

logi

|

13 | Accept |

0.00072849 |

4.2131 |

0.00072849 |

0.00075648 |

6.6161e-08 |

logi

|

14 | Best

|

0.00069682 |

4.3785 |

0.00069682 |

0.00069781 |

7.4324e-08 |

logi

|

15 | Best

|

0.00066515 |

4.3717 |

0.00066515 |

0.00068861 |

7.6994e-08 |

logi

|

16 | Accept |

0.00076017 |

3.9068 |

0.00066515 |

0.00068881 |

7.0687e-10 |

logi

|

17 | Accept |

0.00066515 |

4.5966 |

0.00066515 |

0.0006838 |

7.7159e-08 |

logi

|

18 | Accept |

0.0012353 |

4.6723 |

0.00066515 |

0.00068521 |

0.00083275 |

|

19 | Accept |

0.00076017 |

4.2389 |

0.00066515 |

0.00068508 |

5.0781e-05 |

|

20 | Accept |

0.00085519 |

3.2483 |

0.00066515 |

0.00068527 |

0.00022104 |

logi

logi

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Lambda | Lea

32-1601

32

Functions

| | result | | runtime | (observed) | (estim.) | | |================================================================================================ | 21 | Accept | 0.00082351 | 6.7163 | 0.00066515 | 0.00068569 | 4.5396e-06 |

32-1602

|

22 | Accept |

0.0010769 |

15.946 |

0.00066515 |

0.00070107 |

5.1931e-06 |

logi

|

23 | Accept |

0.00095021 |

17.989 |

0.00066515 |

0.00069594 |

1.3051e-06 |

logi

|

24 | Accept |

0.00085519 |

5.5831 |

0.00066515 |

0.00069625 |

1.6481e-05 |

|

25 | Accept |

0.00085519 |

4.5366 |

0.00066515 |

0.00069643 |

1.157e-06 |

|

26 | Accept |

0.00079184 |

3.6968 |

0.00066515 |

0.00069667 |

1.0016e-08 |

|

27 | Accept |

0.00072849 |

3.9655 |

0.00066515 |

0.00069848 |

4.2234e-08 |

logi

|

28 | Accept |

0.049221 |

0.49611 |

0.00066515 |

0.00069842 |

3.1608 |

logi

|

29 | Accept |

0.00085519 |

4.3911 |

0.00066515 |

0.00069855 |

8.5626e-10 |

|

30 | Accept |

0.00076017 |

3.8952 |

0.00066515 |

0.00069837 |

3.1946e-10 |

logi

fitclinear

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 173.5287 seconds. Total objective function evaluation time: 153.4246 Best observed feasible point: Lambda Learner __________ ________ 7.6994e-08

logistic

Observed objective function value = 0.00066515 Estimated objective function value = 0.00069859

32-1603

32

Functions

Function evaluation time = 4.3717 Best estimated feasible point (according to models): Lambda Learner __________ ________ 7.4324e-08

logistic

Estimated objective function value = 0.00069837 Estimated function evaluation time = 4.3951 Mdl = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'logit' Beta: [34023×1 double] Bias: -10.1723 Lambda: 7.4324e-08 Learner: 'logistic' Properties, Methods

Input Arguments X — Predictor data full matrix | sparse matrix Predictor data, specified as an n-by-p full or sparse matrix. The length of Y and the number of observations in X must be equal. Note If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in optimizationexecution time. Data Types: single | double Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels to which the classification model is trained, specified as a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. • fitclinear only supports binary classification. Either Y must contain exactly two distinct classes, or you must specify two classes for training using the 'ClassNames' name-value pair argument. For multiclass learning, see fitcecoc. • If Y is a character array, then each element must correspond to one row of the array. • The length of Y and the number of observations in X must be equal. 32-1604

fitclinear

• It is good practice to specify the class order using the ClassNames name-value pair argument. Data Types: char | string | cell | categorical | logical | single | double Note fitclinear removes missing observations, that is, observations with any of these characteristics: • NaN, empty character vector (''), empty string (""), , and elements in the response (Y or ValidationData{2}) • At least one NaN value in a predictor observation (row in X or ValidationData{1}) • NaN value or 0 weight (Weights or ValidationData{3}) For memory-usage economy, it is best practice to remove observations containing missing values from your training data manually before training. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Note You cannot use any cross-validation name-value pair argument along with the 'OptimizeHyperparameters' name-value pair argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value pair argument. Example: 'ObservationsIn','columns','Learner','logistic','CrossVal','on' specifies that the columns of the predictor matrix corresponds to observations, to implement logistic regression, to implement 10-fold cross-validation. Linear Classification Options

Lambda — Regularization term strength 'auto' (default) | nonnegative scalar | vector of nonnegative values Regularization term strength, specified as the comma-separated pair consisting of 'Lambda' and 'auto', a nonnegative scalar, or a vector of nonnegative values. • For 'auto', Lambda = 1/n. • If you specify a cross-validation, name-value pair argument (e.g., CrossVal), then n is the number of in-fold observations. • Otherwise, n is the training sample size. • For a vector of nonnegative values, the software sequentially optimizes the objective function for each distinct value in Lambda in ascending order. • If Solver is 'sgd' or 'asgd' and Regularization is 'lasso', then the software does not use the previous coefficient estimates as a warm start on page 32-1625 for the next optimization iteration. Otherwise, the software uses warm starts. 32-1605

32

Functions

• If Regularization is 'lasso', then any coefficient estimate of 0 retains its value when the software optimizes using subsequent values in Lambda. Returns coefficient estimates for all optimization iterations. Example: 'Lambda',10.^(-(10:-2:2)) Data Types: char | string | double | single Learner — Linear classification model type 'svm' (default) | 'logistic' Linear classification model type, specified as the comma-separated pair consisting of 'Learner' and 'svm' or 'logistic'. In this table, f x = xβ + b . • β is a vector of p coefficients. • x is an observation from p predictor variables. • b is the scalar bias. Value

Algorithm

Response Range

'svm'

Support vector machine y ∊ {–1,1}; 1 for the positive class and –1 otherwise

Hinge: ℓ y, f x = max 0, 1 − yf x

'logistic'

Logistic regression

Deviance (logistic): ℓ y, f x = log 1 + exp −yf x

Same as 'svm'

Loss Function

Example: 'Learner','logistic' ObservationsIn — Predictor data observation dimension 'rows' (default) | 'columns' Predictor data observation dimension, specified as the comma-separated pair consisting of 'ObservationsIn' and 'columns' or 'rows'. Note If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in optimizationexecution time. Regularization — Complexity penalty type 'lasso' | 'ridge' Complexity penalty type, specified as the comma-separated pair consisting of 'Regularization' and 'lasso' or 'ridge'. The software composes the objective function for minimization from the sum of the average loss function (see Learner) and the regularization term in this table.

32-1606

fitclinear

Value

Description

'lasso'

Lasso (L1) penalty: λ

p



j=1

'ridge'

Ridge (L2) penalty:

βj

p

λ 2 βj 2 j∑ =1

To specify the regularization term strength, which is λ in the expressions, use Lambda. The software excludes the bias term (β0) from the regularization penalty. If Solver is 'sparsa', then the default value of Regularization is 'lasso'. Otherwise, the default is 'ridge'. Tip • For predictor variable selection, specify 'lasso'. For more on variable selection, see “Introduction to Feature Selection” on page 15-49. • For optimization accuracy, specify 'ridge'.

Example: 'Regularization','lasso' Solver — Objective function minimization technique 'sgd' | 'asgd' | 'dual' | 'bfgs' | 'lbfgs' | 'sparsa' | string array | cell array of character vectors Objective function minimization technique, specified as the comma-separated pair consisting of 'Solver' and a character vector or string scalar, a string array, or a cell array of character vectors with values from this table. Value

Description

Restrictions

'sgd'

Stochastic gradient descent (SGD) [5][3]

'asgd'

Average stochastic gradient descent (ASGD) [8]

'dual'

Dual SGD for SVM [2][7]

'bfgs'

Broyden-Fletcher-GoldfarbInefficient if X is very highShanno quasi-Newton algorithm dimensional. (BFGS) [4]

'lbfgs'

Limited-memory BFGS (LBFGS) Regularization must be [4] 'ridge'.

'sparsa'

Sparse Reconstruction by Separable Approximation (SpaRSA) [6]

Regularization must be 'ridge' and Learner must be 'svm'.

Regularization must be 'lasso'.

If you specify: 32-1607

32

Functions

• A ridge penalty (see Regularization) and X contains 100 or fewer predictor variables, then the default solver is 'bfgs'. • An SVM model (see Learner), a ridge penalty, and X contains more than 100 predictor variables, then the default solver is 'dual'. • A lasso penalty and X contains 100 or fewer predictor variables, then the default solver is 'sparsa'. Otherwise, the default solver is 'sgd'. If you specify a string array or cell array of solver names, then the software uses all solvers in the specified order for each Lambda. For more details on which solver to choose, see “Tips” on page 32-1626. Example: 'Solver',{'sgd','lbfgs'} Beta — Initial linear coefficient estimates zeros(p,1) (default) | numeric vector | numeric matrix Initial linear coefficient estimates (β), specified as the comma-separated pair consisting of 'Beta' and a p-dimensional numeric vector or a p-by-L numeric matrix. p is the number of predictor variables in X and L is the number of regularization-strength values (for more details, see Lambda). • If you specify a p-dimensional vector, then the software optimizes the objective function L times using this process. 1

The software optimizes using Beta as the initial value and the minimum value of Lambda as the regularization strength.

2

The software optimizes again using the resulting estimate from the previous optimization as a warm start on page 32-1625, and the next smallest value in Lambda as the regularization strength.

3

The software implements step 2 until it exhausts all values in Lambda.

• If you specify a p-by-L matrix, then the software optimizes the objective function L times. At iteration j, the software uses Beta(:,j) as the initial value and, after it sorts Lambda in ascending order, uses Lambda(j) as the regularization strength. If you set 'Solver','dual', then the software ignores Beta. Data Types: single | double Bias — Initial intercept estimate numeric scalar | numeric vector Initial intercept estimate (b), specified as the comma-separated pair consisting of 'Bias' and a numeric scalar or an L-dimensional numeric vector. L is the number of regularization-strength values (for more details, see Lambda). • If you specify a scalar, then the software optimizes the objective function L times using this process.

32-1608

1

The software optimizes using Bias as the initial value and the minimum value of Lambda as the regularization strength.

2

The uses the resulting estimate as a warm start to the next optimization iteration, and uses the next smallest value in Lambda as the regularization strength.

fitclinear

3

The software implements step 2 until it exhausts all values in Lambda.

• If you specify an L-dimensional vector, then the software optimizes the objective function L times. At iteration j, the software uses Bias(j) as the initial value and, after it sorts Lambda in ascending order, uses Lambda(j) as the regularization strength. • By default: • If Learner is 'logistic', then let gj be 1 if Y(j) is the positive class, and -1 otherwise. Bias is the weighted average of the g for training or, for cross-validation, in-fold observations. • If Learner is 'svm', then Bias is 0. Data Types: single | double FitBias — Linear model intercept inclusion flag true (default) | false Linear model intercept inclusion flag, specified as the comma-separated pair consisting of 'FitBias' and true or false. Value

Description

true

The software includes the bias term b in the linear model, and then estimates it.

false

The software sets b = 0 during estimation.

Example: 'FitBias',false Data Types: logical PostFitBias — Flag to fit linear model intercept after optimization false (default) | true Flag to fit the linear model intercept after optimization, specified as the comma-separated pair consisting of 'PostFitBias' and true or false. Value

Description

false

The software estimates the bias term b and the coefficients β during optimization.

true

To estimate b, the software: 1

Estimates β and b using the model

2

Estimates classification scores

3

Refits b by placing the threshold on the classification scores that attains maximum accuracy

If you specify true, then FitBias must be true. Example: 'PostFitBias',true Data Types: logical Verbose — Verbosity level 0 (default) | nonnegative integer 32-1609

32

Functions

Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and a nonnegative integer. Verbose controls the amount of diagnostic information fitclinear displays at the command line. Value

Description

0

fitclinear does not display diagnostic information.

1

fitclinear periodically displays and stores the value of the objective function, gradient magnitude, and other diagnostic information. FitInfo.History contains the diagnostic information.

Any other positive integer

fitclinear displays and stores diagnostic information at each optimization iteration. FitInfo.History contains the diagnostic information.

Example: 'Verbose',1 Data Types: double | single SGD and ASGD Solver Options

BatchSize — Mini-batch size positive integer Mini-batch size, specified as the comma-separated pair consisting of 'BatchSize' and a positive integer. At each iteration, the software estimates the subgradient using BatchSize observations from the training data. • If X is a numeric matrix, then the default value is 10. • If X is a sparse matrix, then the default value is max([10,ceil(sqrt(ff))]), where ff = numel(X)/nnz(X) (the fullness factor of X). Example: 'BatchSize',100 Data Types: single | double LearnRate — Learning rate positive scalar Learning rate, specified as the comma-separated pair consisting of 'LearnRate' and a positive scalar. LearnRate specifies how many steps to take per iteration. At each iteration, the gradient specifies the direction and magnitude of each step. • If Regularization is 'ridge', then LearnRate specifies the initial learning rate γ0. The software determines the learning rate for iteration t, γt, using γt =

γ0 1 + λγ0t

c

.

• λ is the value of Lambda. • If Solver is 'sgd', then c = 1. 32-1610

fitclinear

• If Solver is 'asgd', then c is 0.75 [7]. • If Regularization is 'lasso', then, for all iterations, LearnRate is constant. By default, LearnRate is 1/sqrt(1+max((sum(X.^2,obsDim)))), where obsDim is 1 if the observations compose the columns of the predictor data X, and 2 otherwise. Example: 'LearnRate',0.01 Data Types: single | double OptimizeLearnRate — Flag to decrease learning rate true (default) | false Flag to decrease the learning rate when the software detects divergence (that is, over-stepping the minimum), specified as the comma-separated pair consisting of 'OptimizeLearnRate' and true or false. If OptimizeLearnRate is 'true', then: 1

For the few optimization iterations, the software starts optimization using LearnRate as the learning rate.

2

If the value of the objective function increases, then the software restarts and uses half of the current value of the learning rate.

3

The software iterates step 2 until the objective function decreases.

Example: 'OptimizeLearnRate',true Data Types: logical TruncationPeriod — Number of mini-batches between lasso truncation runs 10 (default) | positive integer Number of mini-batches between lasso truncation runs, specified as the comma-separated pair consisting of 'TruncationPeriod' and a positive integer. After a truncation run, the software applies a soft threshold to the linear coefficients. That is, after processing k = TruncationPeriod mini-batches, the software truncates the estimated coefficient j using β j − ut if β j > ut, ∗

βj = 0

if β

j

≤ ut,

β j + ut if β j < − ut . • For SGD, β is the estimate of coefficient j after processing k mini-batches. u = kγ λ . γ is the t j t t learning rate at iteration t. λ is the value of Lambda. • For ASGD, β is the averaged estimate coefficient j after processing k mini-batches, u = kλ . j t If Regularization is 'ridge', then the software ignores TruncationPeriod. Example: 'TruncationPeriod',100 Data Types: single | double 32-1611

32

Functions

Other Classification Options

ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as the comma-separated pair consisting of 'ClassNames' and a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. ClassNames must have the same data type as Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use 'ClassNames' to: • Order the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use 'ClassNames' to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To train the model using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}. The default value for ClassNames is the set of all distinct class names in Y. Example: 'ClassNames',{'b','g'} Data Types: categorical | char | string | logical | single | double | cell Cost — Misclassification cost square matrix | structure array Misclassification cost, specified as the comma-separated pair consisting of 'Cost' and a square matrix or structure. • If you specify the square matrix cost ('Cost',cost), then cost(i,j) is the cost of classifying a point into class j if its true class is i. That is, the rows correspond to the true class, and the columns correspond to the predicted class. To specify the class order for the corresponding rows and columns of cost, use the ClassNames name-value pair argument. • If you specify the structure S ('Cost',S), then it must have two fields: • S.ClassNames, which contains the class names as a variable of the same data type as Y • S.ClassificationCosts, which contains the cost matrix with rows and columns ordered as in S.ClassNames The default value for Cost is ones(K) – eye(K), where K is the number of distinct classes. fitclinear uses Cost to adjust the prior class probabilities specified in Prior. Then, fitclinear uses the adjusted prior probabilities for training and resets the cost matrix to its default. Example: 'Cost',[0 2; 1 0] Data Types: single | double | struct Prior — Prior probabilities 'empirical' (default) | 'uniform' | numeric vector | structure array 32-1612

fitclinear

Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and 'empirical', 'uniform', a numeric vector, or a structure array. This table summarizes the available options for setting prior probabilities. Value

Description

'empirical'

The class prior probabilities are the class relative frequencies in Y.

'uniform'

All class prior probabilities are equal to 1/K, where K is the number of classes.

numeric vector

Each element is a class prior probability. Order the elements according to their order in Y. If you specify the order using the 'ClassNames' namevalue pair argument, then order the elements accordingly.

structure array

A structure S with two fields: • S.ClassNames contains the class names as a variable of the same type as Y. • S.ClassProbs contains a vector of corresponding prior probabilities.

fitclinear normalizes the prior probabilities in Prior to sum to 1. Example: 'Prior',struct('ClassNames', {{'setosa','versicolor'}},'ClassProbs',1:2) Data Types: char | string | double | single | struct ScoreTransform — Score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ... Score transformation, specified as the comma-separated pair consisting of 'ScoreTransform' and a character vector, string scalar, or function handle. This table summarizes the available character vectors and string scalars. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

32-1613

32

Functions

Value

Description

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: 'ScoreTransform','logit' Data Types: char | string | function_handle Weights — Observation weights numeric vector of positive values Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values. fitclinear weighs the observations in X with the corresponding value in Weights. The size of Weights must equal the number of observations in X. fitclinear normalizes Weights to sum up to the value of the prior probability in the respective class. By default, Weights is ones(n,1), where n is the number of observations in X. Data Types: double | single Cross-Validation Options

CrossVal — Cross-validation flag 'off' (default) | 'on' Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software implements 10-fold cross-validation. To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, or KFold. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only. Example: 'Crossval','on' CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object as created by cvpartition. The partition object specifies the type of cross-validation, and also the indexing for training and validation sets. To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', or 'KFold'. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) 32-1614

fitclinear

Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software: 1

Randomly reserves p*100% of the data as validation data, and trains the model using the rest of the data

2

Stores the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', or 'KFold'. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated classifier, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify, e.g., 'KFold',k, then the software: 1

Randomly partitions the data into k sets

2

For each set, reserves the set as validation data, and trains the model using the other k – 1 sets

3

Stores the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', or 'KFold'. Example: 'KFold',8 Data Types: single | double SGD and ASGD Convergence Controls

BatchLimit — Maximal number of batches positive integer Maximal number of batches to process, specified as the comma-separated pair consisting of 'BatchLimit' and a positive integer. When the software processes BatchLimit batches, it terminates optimization. • By default: • The software passes through the data PassLimit times. • If you specify multiple solvers, and use (A)SGD to get an initial approximation for the next solver, then the default value is ceil(1e6/BatchSize). BatchSize is the value of the 'BatchSize' name-value pair argument. • If you specify 'BatchLimit' and 'PassLimit', then the software chooses the argument that results in processing the fewest observations. • If you specify 'BatchLimit' but not 'PassLimit', then the software processes enough batches to complete up to one entire pass through the data. Example: 'BatchLimit',100 Data Types: single | double 32-1615

32

Functions

BetaTolerance — Relative tolerance on linear coefficients and bias term 1e-4 (default) | nonnegative scalar Relative tolerance on the linear coefficients and the bias term (intercept), specified as the commaseparated pair consisting of 'BetaTolerance' and a nonnegative scalar. Let Bt = βt′ bt , that is, the vector of the coefficients and the bias term at optimization iteration t. If Bt − Bt − 1 < BetaTolerance, then optimization terminates. Bt 2 If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'BetaTolerance',1e-6 Data Types: single | double NumCheckConvergence — Number of batches to process before next convergence check positive integer Number of batches to process before next convergence check, specified as the comma-separated pair consisting of 'NumCheckConvergence' and a positive integer. To specify the batch size, see BatchSize. The software checks for convergence about 10 times per pass through the entire data set by default. Example: 'NumCheckConvergence',100 Data Types: single | double PassLimit — Maximal number of passes 1 (default) | positive integer Maximal number of passes through the data, specified as the comma-separated pair consisting of 'PassLimit' and a positive integer. fitclinear processes all observations when it completes one pass through the data. When fitclinear passes through the data PassLimit times, it terminates optimization. If you specify 'BatchLimit' and 'PassLimit', then fitclinear chooses the argument that results in processing the fewest observations. Example: 'PassLimit',5 Data Types: single | double ValidationData — Validation data for optimization convergence detection cell array Data for optimization convergence detection, specified as the comma-separated pair consisting of 'ValidationData' and a cell array. During optimization, the software periodically estimates the loss of ValidationData. If the validation-data loss increases, then the software terminates optimization. For more details, see “Algorithms” on page 32-1627. To optimize hyperparameters using cross-validation, see crossvalidation options such as CrossVal. 32-1616

fitclinear

• ValidationData(1) must contain an m-by-p or p-by-m full or sparse matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. The number of observations in both sets can vary. • ValidationData{2} and Y must be the same data type. The set of all distinct labels of ValidationData{2} must be a subset of all distinct labels of Y. • Optionally, ValidationData(3) can contain an m-dimensional numeric vector of observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData, then, to display validation loss at the command line, specify a value larger than 0 for Verbose. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. By default, the software does not detect convergence by monitoring validation-data loss. Dual SGD Convergence Controls

BetaTolerance — Relative tolerance on linear coefficients and bias term 1e-4 (default) | nonnegative scalar Relative tolerance on the linear coefficients and the bias term (intercept), specified as the commaseparated pair consisting of 'BetaTolerance' and a nonnegative scalar. Let Bt = βt′ bt , that is, the vector of the coefficients and the bias term at optimization iteration t. If Bt − Bt − 1 < BetaTolerance, then optimization terminates. Bt 2 If you also specify DeltaGradientTolerance, then optimization terminates when the software satisfies either stopping criterion. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'BetaTolerance',1e-6 Data Types: single | double DeltaGradientTolerance — Gradient-difference tolerance 1 (default) | nonnegative scalar Gradient-difference tolerance between upper and lower pool Karush-Kuhn-Tucker (KKT) complementarity conditions on page 32-1690 violators, specified as the comma-separated pair consisting of 'DeltaGradientTolerance' and a nonnegative scalar. • If the magnitude of the KKT violators is less than DeltaGradientTolerance, then the software terminates optimization. • If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'DeltaGapTolerance',1e-2 Data Types: double | single 32-1617

32

Functions

NumCheckConvergence — Number of passes through entire data set to process before next convergence check 5 (default) | positive integer Number of passes through entire data set to process before next convergence check, specified as the comma-separated pair consisting of 'NumCheckConvergence' and a positive integer. Example: 'NumCheckConvergence',100 Data Types: single | double PassLimit — Maximal number of passes 10 (default) | positive integer Maximal number of passes through the data, specified as the comma-separated pair consisting of 'PassLimit' and a positive integer. When the software completes one pass through the data, it has processed all observations. When the software passes through the data PassLimit times, it terminates optimization. Example: 'PassLimit',5 Data Types: single | double ValidationData — Validation data for optimization convergence detection cell array Data for optimization convergence detection, specified as the comma-separated pair consisting of 'ValidationData' and a cell array. During optimization, the software periodically estimates the loss of ValidationData. If the validation-data loss increases, then the software terminates optimization. For more details, see “Algorithms” on page 32-1627. To optimize hyperparameters using cross-validation, see crossvalidation options such as CrossVal. • ValidationData(1) must contain an m-by-p or p-by-m full or sparse matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. The number of observations in both sets can vary. • ValidationData{2} and Y must be the same data type. The set of all distinct labels of ValidationData{2} must be a subset of all distinct labels of Y. • Optionally, ValidationData(3) can contain an m-dimensional numeric vector of observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData, then, to display validation loss at the command line, specify a value larger than 0 for Verbose. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. By default, the software does not detect convergence by monitoring validation-data loss. BFGS, LBFGS, and SpaRSA Convergence Controls

BetaTolerance — Relative tolerance on linear coefficients and bias term 1e-4 (default) | nonnegative scalar 32-1618

fitclinear

Relative tolerance on the linear coefficients and the bias term (intercept), specified as the commaseparated pair consisting of 'BetaTolerance' and a nonnegative scalar. Let Bt = βt′ bt , that is, the vector of the coefficients and the bias term at optimization iteration t. If Bt − Bt − 1 < BetaTolerance, then optimization terminates. Bt 2 If you also specify GradientTolerance, then optimization terminates when the software satisfies either stopping criterion. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'BetaTolerance',1e-6 Data Types: single | double GradientTolerance — Absolute gradient tolerance 1e-6 (default) | nonnegative scalar Absolute gradient tolerance, specified as the comma-separated pair consisting of 'GradientTolerance' and a nonnegative scalar. Let ∇ℒ t be the gradient vector of the objective function with respect to the coefficients and bias term at optimization iteration t. If ∇ℒ t ∞ = max ∇ℒ t < GradientTolerance, then optimization terminates. If you also specify BetaTolerance, then optimization terminates when the software satisfies either stopping criterion. If the software converges for the last solver specified in the software, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'GradientTolerance',1e-5 Data Types: single | double HessianHistorySize — Size of history buffer for Hessian approximation 15 (default) | positive integer Size of history buffer for Hessian approximation, specified as the comma-separated pair consisting of 'HessianHistorySize' and a positive integer. That is, at each iteration, the software composes the Hessian using statistics from the latest HessianHistorySize iterations. The software does not support 'HessianHistorySize' for SpaRSA. Example: 'HessianHistorySize',10 Data Types: single | double IterationLimit — Maximal number of optimization iterations 1000 (default) | positive integer Maximal number of optimization iterations, specified as the comma-separated pair consisting of 'IterationLimit' and a positive integer. IterationLimit applies to these values of Solver: 'bfgs', 'lbfgs', and 'sparsa'. Example: 'IterationLimit',500 32-1619

32

Functions

Data Types: single | double ValidationData — Validation data for optimization convergence detection cell array Data for optimization convergence detection, specified as the comma-separated pair consisting of 'ValidationData' and a cell array. During optimization, the software periodically estimates the loss of ValidationData. If the validation-data loss increases, then the software terminates optimization. For more details, see “Algorithms” on page 32-1627. To optimize hyperparameters using cross-validation, see crossvalidation options such as CrossVal. • ValidationData(1) must contain an m-by-p or p-by-m full or sparse matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. The number of observations in both sets can vary. • ValidationData{2} and Y must be the same data type. The set of all distinct labels of ValidationData{2} must be a subset of all distinct labels of Y. • Optionally, ValidationData(3) can contain an m-dimensional numeric vector of observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData, then, to display validation loss at the command line, specify a value larger than 0 for Verbose. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. By default, the software does not detect convergence by monitoring validation-data loss. Hyperparameter Optimization

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: • 'none' — Do not optimize. • 'auto' — Use {'Lambda','Learner'}. • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names. • Vector of optimizableVariable objects, typically the output of hyperparameters. The optimization attempts to minimize the cross-validation loss (error) for fitclinear by varying the parameters. For information about cross-validation loss (albeit in a different context), see “Classification Loss” on page 32-2756. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note 'OptimizeHyperparameters' values override any values you set using other name-value pair arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes the 'auto' values to apply. 32-1620

fitclinear

The eligible parameters for fitclinear are: • Lambda — fitclinear searches among positive values, by default log-scaled in the range [1e-5/NumObservations,1e5/NumObservations]. • Learner — fitclinear searches among 'svm' and 'logistic'. • Regularization — fitclinear searches among 'ridge' and 'lasso'. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example, load fisheriris params = hyperparameters('fitclinear',meas,species); params(1).Range = [1e-4,1e6];

Pass params as the value of OptimizeHyperparameters. By default, iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is log(1 + cross-validation loss) for regression and the misclassification rate for classification. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value pair argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value pair argument. For an example, see “Optimize Linear Classifier” on page 32-1600. Example: 'OptimizeHyperparameters','auto' HyperparameterOptimizationOptions — Options for optimization structure Options for optimization, specified as the comma-separated pair consisting of 'HyperparameterOptimizationOptions' and a structure. This argument modifies the effect of the OptimizeHyperparameters name-value pair argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults).

32-1621

32

Functions

Field Name

Values

AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

Default 'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' or 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real. The time limit is in seconds, as measured by tic and toc. Run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables. ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best objective function value against the iteration number. If there are one or two optimization parameters, and if Optimizer is 'bayesopt', then ShowPlots also plots a model of the objective function against the parameters.

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

32-1622

false

fitclinear

Field Name

Values

Default

Verbose

Display to the command line.

1

• 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue pair argument. UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If false, the optimizer uses a single partition for the optimization. true usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three field names. CVPartition

A cvpartition object, as created by cvpartition.

Holdout

A scalar in the range (0,1) representing the holdout fraction.

Kfold

An integer greater than 1.

'Kfold',5 if you do not specify any crossvalidation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments Mdl — Trained linear classification model ClassificationLinear model object | ClassificationPartitionedLinear cross-validated model object Trained linear classification model, returned as a ClassificationLinear model object or ClassificationPartitionedLinear cross-validated model object. If you set any of the name-value pair arguments KFold, Holdout, CrossVal, or CVPartition, then Mdl is a ClassificationPartitionedLinear cross-validated model object. Otherwise, Mdl is a ClassificationLinear model object. 32-1623

32

Functions

To reference properties of Mdl, use dot notation. For example, enter Mdl.Beta in the Command Window to display the vector or matrix of estimated coefficients. Note Unlike other classification models, and for economical memory usage, ClassificationLinear and ClassificationPartitionedLinear model objects do not store the training data or training process details (for example, convergence history). FitInfo — Optimization details structure array Optimization details, returned as a structure array. Fields specify final values or name-value pair argument specifications, for example, Objective is the value of the objective function when optimization terminates. Rows of multidimensional fields correspond to values of Lambda and columns correspond to values of Solver. This table describes some notable fields. Field

Description

TerminationStatus

• Reason for optimization termination • Corresponds to a value in TerminationCode

FitTime

Elapsed, wall-clock time in seconds

History

A structure array of optimization information for each iteration. The field Solver stores solver types using integer coding. Integer

Solver

1

SGD

2

ASGD

3

Dual SGD for SVM

4

LBFGS

5

BFGS

6

SpaRSA

To access fields, use dot notation. For example, to access the vector of objective function values for each iteration, enter FitInfo.History.Objective. It is good practice to examine FitInfo to assess whether convergence is satisfactory. HyperparameterOptimizationResults — Cross-validation optimization of hyperparameters BayesianOptimization object | table of hyperparameters and associated values Cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. The output is nonempty when the value of 'OptimizeHyperparameters' is not 'none'. The output value depends on the Optimizer field value of the 'HyperparameterOptimizationOptions' name-value pair argument:

32-1624

fitclinear

Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

More About Warm Start A warm start is initial estimates of the beta coefficients and bias term supplied to an optimization routine for quicker convergence. Alternatives for Lower-Dimensional Data High-dimensional linear classification and regression models minimize objective functions relatively quickly, but at the cost of some accuracy, the numeric-only predictor variables restriction, and the model must be linear with respect to the parameters. If your predictor data set is low- through medium-dimensional, or contains heterogeneous variables, then you should use the appropriate classification or regression fitting function. To help you decide which fitting function is appropriate for your low-dimensional data set, use this table. Model to Fit

Function

Notable Algorithmic Differences

SVM

• Binary classification: fitcsvm

• Computes the Gram matrix of the predictor variables, which is convenient for nonlinear kernel transformations.

• Multiclass classification: fitcecoc

Linear regression

• Regression: fitrsvm

• Solves dual problem using SMO, ISDA, or L1 minimization via quadratic programming using quadprog.

• Least-squares without regularization: fitlm

• lasso implements cyclic coordinate descent.

• Regularized least-squares using a lasso penalty: lasso • Ridge regression: ridge or lasso Logistic regression

• Logistic regression without regularization: fitglm. • Regularized logistic regression using a lasso penalty: lassoglm

• fitglm implements iteratively reweighted least squares. • lassoglm implements cyclic coordinate descent.

32-1625

32

Functions

Tips • It is a best practice to orient your predictor matrix so that observations correspond to columns and to specify 'ObservationsIn','columns'. As a result, you can experience a significant reduction in optimization-execution time. • For better optimization accuracy if X is high-dimensional and Regularization is 'ridge', set any of these combinations for Solver: • 'sgd' • 'asgd' • 'dual' if Learner is 'svm' • {'sgd','lbfgs'} • {'asgd','lbfgs'} • {'dual','lbfgs'} if Learner is 'svm' Other combinations can result in poor optimization accuracy. • For better optimization accuracy if X is moderate- through low-dimensional and Regularization is 'ridge', set Solver to 'bfgs'. • If Regularization is 'lasso', set any of these combinations for Solver: • 'sgd' • 'asgd' • 'sparsa' • {'sgd','sparsa'} • {'asgd','sparsa'} • When choosing between SGD and ASGD, consider that: • SGD takes less time per iteration, but requires more iterations to converge. • ASGD requires fewer iterations to converge, but takes more time per iteration. • If X has few observations, but many predictor variables, then: • Specify 'PostFitBias',true. • For SGD or ASGD solvers, set PassLimit to a positive integer that is greater than 1, for example, 5 or 10. This setting often results in better accuracy. • For SGD and ASGD solvers, BatchSize affects the rate of convergence. • If BatchSize is too small, then fitclinear achieves the minimum in many iterations, but computes the gradient per iteration quickly. • If BatchSize is too large, then fitclinear achieves the minimum in fewer iterations, but computes the gradient per iteration slowly. • Large learning rates (see LearnRate) speed up convergence to the minimum, but can lead to divergence (that is, over-stepping the minimum). Small learning rates ensure convergence to the minimum, but can lead to slow termination. • When using lasso penalties, experiment with various values of TruncationPeriod. For example, set TruncationPeriod to 1, 10, and then 100. • For efficiency, fitclinear does not standardize predictor data. To standardize X, enter 32-1626

fitclinear

X = bsxfun(@rdivide,bsxfun(@minus,X,mean(X,2)),std(X,0,2));

The code requires that you orient the predictors and observations as the rows and columns of X, respectively. Also, for memory-usage economy, the code replaces the original predictor data the standardized data. • After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 31-2.

Algorithms • If you specify ValidationData, then, during objective-function optimization: • fitclinear estimates the validation loss of ValidationData periodically using the current model, and tracks the minimal estimate. • When fitclinear estimates a validation loss, it compares the estimate to the minimal estimate. • When subsequent, validation loss estimates exceed the minimal estimate five times, fitclinear terminates optimization. • If you specify ValidationData and to implement a cross-validation routine (CrossVal, CVPartition, Holdout, or KFold), then: 1

fitclinear randomly partitions X and Y according to the cross-validation routine that you choose.

2

fitclinear trains the model using the training-data partition. During objective-function optimization, fitclinear uses ValidationData as another possible way to terminate optimization (for details, see the previous bullet).

3

Once fitclinear satisfies a stopping criterion, it constructs a trained model based on the optimized linear coefficients and intercept.

4

a

If you implement k-fold cross-validation, and fitclinear has not exhausted all trainingset folds, then fitclinear returns to Step 2 to train using the next training-set fold.

b

Otherwise, fitclinear terminates training, and then returns the cross-validated model.

You can determine the quality of the cross-validated model. For example: • To determine the validation loss using the holdout or out-of-fold data from step 1, pass the cross-validated model to kfoldLoss. • To predict observations on the holdout or out-of-fold data from step 1, pass the crossvalidated model to kfoldPredict.

References [1] Hsieh, C. J., K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. “A Dual Coordinate Descent Method for Large-Scale Linear SVM.” Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 2001, pp. 408–415. [2] Langford, J., L. Li, and T. Zhang. “Sparse Online Learning Via Truncated Gradient.” J. Mach. Learn. Res., Vol. 10, 2009, pp. 777–801. [3] Nocedal, J. and S. J. Wright. Numerical Optimization, 2nd ed., New York: Springer, 2006. 32-1627

32

Functions

[4] Shalev-Shwartz, S., Y. Singer, and N. Srebro. “Pegasos: Primal Estimated Sub-Gradient Solver for SVM.” Proceedings of the 24th International Conference on Machine Learning, ICML ’07, 2007, pp. 807–814. [5] Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo. “Sparse Reconstruction by Separable Approximation.” Trans. Sig. Proc., Vol. 57, No 7, 2009, pp. 2479–2493. [6] Xiao, Lin. “Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization.” J. Mach. Learn. Res., Vol. 11, 2010, pp. 2543–2596. [7] Xu, Wei. “Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent.” CoRR, abs/1107.2490, 2011.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. Usage notes and limitations: • Some name-value pair arguments have different defaults compared to the default values for the inmemory fitclinear function. Supported name-value pair arguments, and any differences, are: • 'ObservationsIn' — Supports only 'rows'. • 'Lambda' — Can be 'auto' (default) or a scalar. • 'Learner' • 'Regularization' — Supports only 'ridge'. • 'Solver' — Supports only 'lbfgs'. • 'FitBias' — Supports only true. • 'Verbose' — Default value is 1. • 'Beta' • 'Bias' • 'ClassNames' • 'Cost' • 'Prior' • 'Weights' — Value must be a tall array. • 'HessianHistorySize' • 'BetaTolerance' — Default value is relaxed to 1e–3. • 'GradientTolerance' — Default value is relaxed to 1e–3. • 'IterationLimit' — Default value is relaxed to 20. • 'OptimizeHyperparameters' — Value of 'Regularization' parameter must be 'ridge'. • 'HyperparameterOptimizationOptions' — For cross-validation, tall optimization supports only 'Holdout' validation. For example, you can specify fitclinear(X,Y,'OptimizeHyperparameters','auto','HyperparameterOptimizat ionOptions',struct('Holdout',0.2)). 32-1628

fitclinear

• For tall arrays, fitclinear implements LBFGS by distributing the calculation of the loss and gradient among different parts of the tall array at each iteration. Other solvers are not available for tall arrays. When initial values for Beta and Bias are not given, fitclinear refines the initial estimates of the parameters by fitting the model locally to parts of the data and combining the coefficients by averaging. For more information, see “Tall Arrays” (MATLAB). Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. To perform parallel hyperparameter optimization, use the 'HyperparameterOptions', struct('UseParallel',true) name-value pair argument in the call to this function. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationLinear | ClassificationPartitionedLinear | fitcecoc | fitckernel | fitcsvm | fitglm | fitrlinear | kfoldLoss | kfoldPredict | lassoglm | predict | templateLinear | testcholdout Introduced in R2016a

32-1629

32

Functions

fitcnb Train multiclass naive Bayes model

Syntax Mdl = fitcnb(Tbl,ResponseVarName) Mdl = fitcnb(Tbl,formula) Mdl = fitcnb(Tbl,Y) Mdl = fitcnb(X,Y) Mdl = fitcnb( ___ ,Name,Value)

Description Mdl = fitcnb(Tbl,ResponseVarName) returns a multiclass naive Bayes model (Mdl), trained by the predictors in table Tbl and class labels in the variable Tbl.ResponseVarName. Mdl = fitcnb(Tbl,formula) returns a multiclass naive Bayes model (Mdl), trained by the predictors in table Tbl. formula is an explanatory model of the response and a subset of predictor variables in Tbl used to fit Mdl. Mdl = fitcnb(Tbl,Y) returns a multiclass naive Bayes model (Mdl), trained by the predictors in the table Tbl and class labels in the array Y. Mdl = fitcnb(X,Y) returns a multiclass naive Bayes model (Mdl), trained by predictors X and class labels Y. Mdl = fitcnb( ___ ,Name,Value) returns a naive Bayes classifier with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes. For example, you can specify a distribution to model the data, prior probabilities for the classes, or the kernel smoothing window bandwidth.

Examples Train a Naive Bayes Classifier Load Fisher's iris data set. load fisheriris X = meas(:,3:4); Y = species; tabulate(Y) Value setosa versicolor virginica

Count 50 50 50

Percent 33.33% 33.33% 33.33%

The software can classify data with more than two classes using naive Bayes methods. 32-1630

fitcnb

Train a naive Bayes classifier. It is good practice to specify the class order. Mdl = fitcnb(X,Y,... 'ClassNames',{'setosa','versicolor','virginica'}) Mdl = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

'Y' [] {'setosa' 'versicolor' 'none' 150 {'normal' 'normal'} {3x2 cell}

'virginica'}

Properties, Methods

Mdl is a trained ClassificationNaiveBayes classifier. By default, the software models the predictor distribution within each class using a Gaussian distribution having some mean and standard deviation. Use dot notation to display the parameters of a particular Gaussian fit, e.g., display the fit for the first feature within setosa. setosaIndex = strcmp(Mdl.ClassNames,'setosa'); estimates = Mdl.DistributionParameters{setosaIndex,1} estimates = 2×1 1.4620 0.1737

The mean is 1.4620 and the standard deviation is 0.1737. Plot the Gaussian contours. figure gscatter(X(:,1),X(:,2),Y); h = gca; cxlim = h.XLim; cylim = h.YLim; hold on Params = cell2mat(Mdl.DistributionParameters); Mu = Params(2*(1:3)-1,1:2); % Extract the means Sigma = zeros(2,2,3); for j = 1:3 Sigma(:,:,j) = diag(Params(2*j,:)).^2; % Create diagonal covariance matrix xlim = Mu(j,1) + 4*[-1 1]*sqrt(Sigma(1,1,j)); ylim = Mu(j,2) + 4*[-1 1]*sqrt(Sigma(2,2,j)); f = @(x1,x2)reshape(mvnpdf([x1(:),x2(:)],Mu(j,:),Sigma(:,:,j)),size(x1)); fcontour(f,[xlim ylim]) % Draw contours for the multivariate normal distributions end h.XLim = cxlim; h.YLim = cylim; title('Naive Bayes Classifier -- Fisher''s Iris Data') xlabel('Petal Length (cm)')

32-1631

32

Functions

ylabel('Petal Width (cm)') legend('setosa','versicolor','virginica') hold off

You can change the default distribution using the name-value pair argument 'DistributionNames'. For example, if some predictors are categorical, then you can specify that they are multivariate, multinomial random variables using 'DistributionNames','mvmn'.

Specify Prior Probabilities When Training Naive Bayes Classifiers Construct a naive Bayes classifier for Fisher's iris data set. Also, specify prior probabilities during training. Load Fisher's iris data set. load fisheriris X = meas; Y = species; classNames = {'setosa','versicolor','virginica'}; % Class order

X is a numeric matrix that contains four petal measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species. By default, the prior class probability distribution is the relative frequency distribution of the classes in the data set, which in this case is 33% for each species. However, suppose you know that in the 32-1632

fitcnb

population 50% of the irises are setosa, 20% are versicolor, and 30% are virginica. You can incorporate this information by specifying this distribution as a prior probability during training. Train a naive Bayes classifier. Specify the class order and prior class probability distribution. prior = [0.5 0.2 0.3]; Mdl = fitcnb(X,Y,'ClassNames',classNames,'Prior',prior) Mdl = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

'Y' [] {'setosa' 'versicolor' 'virginica'} 'none' 150 {'normal' 'normal' 'normal' 'normal'} {3x4 cell}

Properties, Methods

Mdl is a trained ClassificationNaiveBayes classifier, and some of its properties appear in the Command Window. The software treats the predictors as independent given a class, and, by default, fits them using normal distributions. The naive Bayes algorithm does not use the prior class probabilities during training. Therefore, you can specify prior class probabilities after training using dot notation. For example, suppose that you want to see the difference in performance between a model that uses the default prior class probabilities and a model that uses prior. Create a new naive Bayes model based on Mdl, and specify that the prior class probability distribution is an empirical class distribution. defaultPriorMdl = Mdl; FreqDist = cell2table(tabulate(Y)); defaultPriorMdl.Prior = FreqDist{:,3};

The software normalizes the prior class probabilities to sum to 1. Estimate the cross-validation error for both models using 10-fold cross-validation. rng(1); % For reproducibility defaultCVMdl = crossval(defaultPriorMdl); defaultLoss = kfoldLoss(defaultCVMdl) defaultLoss = 0.0533 CVMdl = crossval(Mdl); Loss = kfoldLoss(CVMdl) Loss = 0.0340

Mdl performs better than defaultPriorMdl.

32-1633

32

Functions

Specify Predictor Distributions for Naive Bayes Classifiers Load Fisher's iris data set. load fisheriris X = meas; Y = species;

Train a naive Bayes classifier using every predictor. It is good practice to specify the class order. Mdl1 = fitcnb(X,Y,... 'ClassNames',{'setosa','versicolor','virginica'}) Mdl1 = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: DistributionNames: DistributionParameters:

'Y' [] {'setosa' 'versicolor' 'virginica'} 'none' 150 {'normal' 'normal' 'normal' 'normal'} {3x4 cell}

Properties, Methods Mdl1.DistributionParameters ans=3×4 cell array {2x1 double} {2x1 double} {2x1 double}

{2x1 double} {2x1 double} {2x1 double}

{2x1 double} {2x1 double} {2x1 double}

{2x1 double} {2x1 double} {2x1 double}

Mdl1.DistributionParameters{1,2} ans = 2×1 3.4280 0.3791

By default, the software models the predictor distribution within each class as a Gaussian with some mean and standard deviation. There are four predictors and three class levels. Each cell in Mdl1.DistributionParameters corresponds to a numeric vector containing the mean and standard deviation of each distribution, e.g., the mean and standard deviation for setosa iris sepal widths are 3.4280 and 0.3791, respectively. Estimate the confusion matrix for Mdl1. isLabels1 = resubPredict(Mdl1); ConfusionMat1 = confusionchart(Y,isLabels1);

32-1634

fitcnb

Element (j, k) of the confusion matrix chart represents the number of observations that the software classifies as k, but are truly in class j according to the data. Retrain the classifier using the Gaussian distribution for predictors 1 and 2 (the sepal lengths and widths), and the default normal kernel density for predictors 3 and 4 (the petal lengths and widths). Mdl2 = fitcnb(X,Y,... 'DistributionNames',{'normal','normal','kernel','kernel'},... 'ClassNames',{'setosa','versicolor','virginica'}); Mdl2.DistributionParameters{1,2} ans = 2×1 3.4280 0.3791

The software does not train parameters to the kernel density. Rather, the software chooses an optimal width. However, you can specify a width using the 'Width' name-value pair argument. Estimate the confusion matrix for Mdl2. isLabels2 = resubPredict(Mdl2); ConfusionMat2 = confusionchart(Y,isLabels2);

32-1635

32

Functions

Based on the confusion matrices, the two classifiers perform similarly in the training sample.

Compare Classifiers Using Cross-Validation Load Fisher's iris data set. load fisheriris X = meas; Y = species; rng(1); % For reproducibility

Train and cross-validate a naive Bayes classifier using the default options and k-fold cross-validation. It is good practice to specify the class order. CVMdl1 = fitcnb(X,Y,... 'ClassNames',{'setosa','versicolor','virginica'},... 'CrossVal','on');

By default, the software models the predictor distribution within each class as a Gaussian with some mean and standard deviation. CVMdl1 is a ClassificationPartitionedModel model. Create a default naive Bayes binary classifier template, and train an error-correcting, output codes multiclass model. 32-1636

fitcnb

t = templateNaiveBayes(); CVMdl2 = fitcecoc(X,Y,'CrossVal','on','Learners',t);

CVMdl2 is a ClassificationPartitionedECOC model. You can specify options for the naive Bayes binary learners using the same name-value pair arguments as for fitcnb. Compare the out-of-sample k-fold classification error (proportion of misclassified observations). classErr1 = kfoldLoss(CVMdl1,'LossFun','ClassifErr') classErr1 = 0.0533 classErr2 = kfoldLoss(CVMdl2,'LossFun','ClassifErr') classErr2 = 0.0467

Mdl2 has a lower generalization error.

Train Naive Bayes Classifiers Using Multinomial Predictors Some spam filters classify an incoming email as spam based on how many times a word or punctuation (called tokens) occurs in an email. The predictors are the frequencies of particular words or punctuations in an email. Therefore, the predictors compose multinomial random variables. This example illustrates classification using naive Bayes and multinomial predictors. Create Training Data Suppose you observed 1000 emails and classified them as spam or not spam. Do this by randomly assigning -1 or 1 to y for each email. n = 1000; rng(1); Y = randsample([-1 1],n,true);

% Sample size % For reproducibility % Random labels

To build the predictor data, suppose that there are five tokens in the vocabulary, and 20 observed tokens per email. Generate predictor data from the five tokens by drawing random, multinomial deviates. The relative frequencies for tokens corresponding to spam emails should differ from emails that are not spam. tokenProbs = [0.2 0.3 0.1 0.15 0.25;... 0.4 0.1 0.3 0.05 0.15]; % Token relative frequencies tokensPerEmail = 20; % Fixed for convenience X = zeros(n,5); X(Y == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),sum(Y == 1)); X(Y == -1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),sum(Y == -1));

Train the Classifier Train a naive Bayes classifier. Specify that the predictors are multinomial. Mdl = fitcnb(X,Y,'DistributionNames','mn');

Mdl is a trained ClassificationNaiveBayes classifier. Assess the in-sample performance of Mdl by estimating the misclassification error. 32-1637

32

Functions

isGenRate = resubLoss(Mdl,'LossFun','ClassifErr') isGenRate = 0.0200

The in-sample misclassification rate is 2%. Create New Data Randomly generate deviates that represent a new batch of emails. newN = 500; newY = randsample([-1 1],newN,true); newX = zeros(newN,5); newX(newY == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),... sum(newY == 1)); newX(newY == -1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),... sum(newY == -1));

Assess Classifier Performance Classify the new emails using the trained naive Bayes classifier Mdl, and determine whether the algorithm generalizes. oosGenRate = loss(Mdl,newX,newY) oosGenRate = 0.0261

The out-of-sample misclassification rate is 2.6% indicating that the classifier generalizes fairly well.

Optimize Naive Bayes Classifier This example shows how to use the OptimizeHyperparameters name-value pair to minimize crossvalidation loss in a naive Bayes classifier using fitcnb. The example uses Fisher's iris data. Load Fisher's iris data. load fisheriris X = meas; Y = species; classNames = {'setosa','versicolor','virginica'};

Optimize the classification using the 'auto' parameters. For reproducibility, set the random seed and use the 'expected-improvement-plus' acquisition function. rng default Mdl = fitcnb(X,Y,'ClassNames',classNames,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',... 'expected-improvement-plus')) Warning: It is recommended that you first standardize all numeric predictors when optimizing the

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Distribution-| W | | result | | runtime | (observed) | (estim.) | Names |

32-1638

fitcnb

|================================================================================================ | 1 | Best | 0.053333 | 0.76698 | 0.053333 | 0.053333 | normal | |

2 | Best

|

0.046667 |

1.1491 |

0.046667 |

0.049998 |

kernel |

0.1

|

3 | Accept |

0.053333 |

0.48382 |

0.046667 |

0.046667 |

normal |

|

4 | Accept |

0.086667 |

0.8921 |

0.046667 |

0.046668 |

kernel |

2.

|

5 | Accept |

0.046667 |

0.72509 |

0.046667 |

0.046663 |

kernel |

0.1

|

6 | Accept |

0.073333 |

0.66039 |

0.046667 |

0.046665 |

kernel |

0.02

|

7 | Accept |

0.046667 |

0.82194 |

0.046667 |

0.046655 |

kernel |

0.2

|

8 | Accept |

0.046667 |

0.63877 |

0.046667 |

0.046647 |

kernel |

0.

|

9 | Accept |

0.06 |

0.7109 |

0.046667 |

0.046658 |

kernel |

0.4

|

10 | Accept |

0.046667 |

0.90596 |

0.046667 |

0.046618 |

kernel |

0.

|

11 | Accept |

0.046667 |

0.90006 |

0.046667 |

0.046619 |

kernel |

0.07

|

12 | Accept |

0.046667 |

0.77155 |

0.046667 |

0.046612 |

kernel |

0.08

|

13 | Accept |

0.046667 |

0.77709 |

0.046667 |

0.046603 |

kernel |

0.1

|

14 | Accept |

0.046667 |

0.7955 |

0.046667 |

0.046607 |

kernel |

0.2

|

15 | Accept |

0.046667 |

0.6893 |

0.046667 |

0.046606 |

kernel |

0.1

|

16 | Accept |

0.046667 |

0.75903 |

0.046667 |

0.046606 |

kernel |

0.1

|

17 | Accept |

0.046667 |

0.73479 |

0.046667 |

0.046606 |

kernel |

0.07

|

18 | Accept |

0.046667 |

0.72235 |

0.046667 |

0.046626 |

kernel |

0.2

|

19 | Accept |

0.046667 |

0.75867 |

0.046667 |

0.046626 |

kernel |

0.09

|

20 | Accept |

0.046667 |

0.72999 |

0.046667 |

0.046627 |

kernel |

0.06

|================================================================================================ | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Distribution-| W | | result | | runtime | (observed) | (estim.) | Names | |================================================================================================ | 21 | Accept | 0.046667 | 0.72038 | 0.046667 | 0.046627 | kernel | 0.06 |

22 | Accept |

0.093333 |

0.73518 |

0.046667 |

0.046618 |

kernel |

5.

|

23 | Accept |

0.046667 |

0.76635 |

0.046667 |

0.046619 |

kernel |

0.06

|

24 | Accept |

0.046667 |

0.90539 |

0.046667 |

0.04663 |

kernel |

0.2

|

25 | Accept |

0.046667 |

0.8378 |

0.046667 |

0.04663 |

kernel |

0.

|

26 | Accept |

0.046667 |

0.78547 |

0.046667 |

0.046631 |

kernel |

0.1

|

27 | Accept |

0.046667 |

0.81652 |

0.046667 |

0.046631 |

kernel |

0.1

|

28 | Accept |

0.046667 |

0.7371 |

0.046667 |

0.046631 |

kernel |

0.06

|

29 | Accept |

0.046667 |

0.80899 |

0.046667 |

0.046631 |

kernel |

0.

32-1639

32

Functions

|

32-1640

30 | Accept |

0.08 |

0.9916 |

0.046667 |

0.046628 |

kernel |

1.

fitcnb

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 106.4842 seconds. Total objective function evaluation time: 23.4981 Best observed feasible point: DistributionNames Width _________________ _______ kernel

0.11903

Observed objective function value = 0.046667 Estimated objective function value = 0.046667 Function evaluation time = 1.1491 Best estimated feasible point (according to models): DistributionNames Width _________________ _______

32-1641

32

Functions

kernel

0.25613

Estimated objective function value = 0.046628 Estimated function evaluation time = 0.78985 Mdl = ClassificationNaiveBayes ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: HyperparameterOptimizationResults: DistributionNames: DistributionParameters: Kernel: Support: Width:

'Y' [] {'setosa' 'versicolor' 'virginica'} 'none' 150 [1x1 BayesianOptimization] {1x4 cell} {3x4 cell} {1x4 cell} {1x4 cell} [3x4 double]

Properties, Methods

Input Arguments Tbl — Sample data table Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName. If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula by using formula. If Tbl does not contain the response variable, then specify a response variable by using Y. The length of the response variable and the number of rows in Tbl must be equal. Data Types: table ResponseVarName — Response variable name name of variable in Tbl Response variable name, specified as the name of a variable in Tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify it as 'Y'. Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model. The response variable must be a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. If Y is a character array, then each element of the response variable must correspond to one row of the array. 32-1642

fitcnb

A good practice is to specify the order of the classes by using the ClassNames name-value pair argument. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form 'Y~X1+X2+X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables. To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula. The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,Tbl.Properties.VariableNames)

If the variable names in Tbl are not valid, then convert them by using the matlab.lang.makeValidName function. Tbl.Properties.VariableNames = matlab.lang.makeValidName(Tbl.Properties.VariableNames);

Data Types: char | string Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels to which the naive Bayes classifier is trained, specified as a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. Each element of Y defines the class membership of the corresponding row of X. Y supports K class levels. If Y is a character array, then each row must correspond to one class label. The length of Y and the number of rows of X must be equivalent. Data Types: categorical | char | string | logical | single | double | cell X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature). The length of Y and the number of rows of X must be equivalent. Data Types: double 32-1643

32

Functions

Note: The software treats NaN, empty character vector (''), empty string (""), , and elements as missing data values. • If Y contains missing values, then the software removes them and the corresponding rows of X. • If X contains any rows composed entirely of missing values, then the software removes those rows and the corresponding elements of Y. • If X contains missing values and you set 'DistributionNames','mn', then the software removes those rows of X and the corresponding elements of Y. • If a predictor is not represented in a class, that is, if all of its values are NaN within a class, then the software returns an error. Removing rows of X and corresponding elements of Y decreases the effective training or crossvalidation sample size. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Note You cannot use any cross-validation name-value pair argument along with the 'OptimizeHyperparameters' name-value pair argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value pair argument. Example: 'DistributionNames','mn','Prior','uniform','KSWidth',0.5 specifies that the data distribution is multinomial, the prior probabilities for all classes are equal, and the kernel smoothing window bandwidth for all classes is 0.5 units. Naive Bayes Options

DistributionNames — Data distributions 'kernel' | 'mn' | 'mvmn' | 'normal' | string array | cell array of character vectors Data distributions fitcnb uses to model the data, specified as the comma-separated pair consisting of 'DistributionNames' and a character vector or string scalar, a string array, or a cell array of character vectors with values from this table.

32-1644

Value

Description

'kernel'

Kernel smoothing density estimate.

'mn'

Multinomial distribution. If you specify mn, then all features are components of a multinomial distribution. Therefore, you cannot include 'mn' as an element of a string array or a cell array of character vectors. For details, see “Algorithms” on page 32-1657.

'mvmn'

Multivariate multinomial distribution. For details, see “Algorithms” on page 32-1657.

fitcnb

Value

Description

'normal'

Normal (Gaussian) distribution.

If you specify a character vector or string scalar, then the software models all the features using that distribution. If you specify a 1-by-P string array or cell array of character vectors, then the software models feature j using the distribution in element j of the array. By default, the software sets all predictors specified as categorical predictors (using the CategoricalPredictors name-value pair argument) to 'mvmn'. Otherwise, the default distribution is 'normal'. You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Support, or Width. Example: 'DistributionNames','mn' Example: 'DistributionNames',{'kernel','normal','kernel'} Kernel — Kernel smoother type 'normal' (default) | 'box' | 'epanechnikov' | 'triangle' | string array | cell array of character vectors Kernel smoother type, specified as the comma-separated pair consisting of 'Kernel' and a character vector or string scalar, a string array, or a cell array of character vectors. This table summarizes the available options for setting the kernel smoothing density region. Let I{u} denote the indicator function. Value

Kernel

'box'

Box (uniform)

'epanechnik Epanechnikov ov' 'normal'

Gaussian

'triangle'

Triangular

Formula f (x) = 0.5I x ≤ 1 f (x) = 0.75 1 − x2 I x ≤ 1 f (x) =

1 exp −0.5x2 2π

f (x) = 1 − x I x ≤ 1

If you specify a 1-by-P string array or cell array, with each element of the array containing any value in the table, then the software trains the classifier using the kernel smoother type in element j for feature j in X. The software ignores elements of Kernel not corresponding to a predictor whose distribution is 'kernel'. You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Support, or Width. Example: 'Kernel',{'epanechnikov','normal'} Support — Kernel smoothing density support 'unbounded' (default) | 'positive' | string array | cell array | numeric row vector Kernel smoothing density support, specified as the comma-separated pair consisting of 'Support' and 'positive', 'unbounded', a string array, a cell array, or a numeric row vector. The software applies the kernel smoothing density to the specified region. This table summarizes the available options for setting the kernel smoothing density region. 32-1645

32

Functions

Value

Description

1-by-2 numeric row vector

For example, [L,U], where L and U are the finite lower and upper bounds, respectively, for the density support.

'positive'

The density support is all positive real values.

'unbounded'

The density support is all real values.

If you specify a 1-by-P string array or cell array, with each element in the string array containing any text value in the table and each element in the cell array containing any value in the table, then the software trains the classifier using the kernel support in element j for feature j in X. The software ignores elements of Kernel not corresponding to a predictor whose distribution is 'kernel'. You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Support, or Width. Example: 'KSSupport',{[-10,20],'unbounded'} Data Types: char | string | cell | double Width — Kernel smoothing window width matrix of numeric values | numeric column vector | numeric row vector | scalar Kernel smoothing window width, specified as the comma-separated pair consisting of 'Width' and a matrix of numeric values, numeric column vector, numeric row vector, or scalar. Suppose there are K class levels and P predictors. This table summarizes the available options for setting the kernel smoothing window width. Value

Description

K-by-P matrix of numeric values

Element (k,j) specifies the width for predictor j in class k.

K-by-1 numeric column vector

Element k specifies the width for all predictors in class k.

1-by-P numeric row vector

Element j specifies the width in all class levels for predictor j.

scalar

Specifies the bandwidth for all features in all classes.

By default, the software selects a default width automatically for each combination of predictor and class by using a value that is optimal for a Gaussian distribution. If you specify Width and it contains NaNs, then the software selects widths for the elements containing NaNs. You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Support, or Width. Example: 'Width',[NaN NaN] Data Types: double | struct Cross-Validation Options

CrossVal — Cross-validation flag 'off' (default) | 'on' Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software implements 10-fold cross-validation. 32-1646

fitcnb

To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, KFold, or Leaveout. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only. Alternatively, cross-validate later by passing Mdl to crossval. Example: 'CrossVal','on' CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets. To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',5 32-1647

32

Functions

Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Leaveout','on' Other Classification Options

CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' Categorical predictors list, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the values in this table. Value

Description

Vector of positive integers

Each entry in the vector is an index value corresponding to the column of the predictor data (X or Tbl) that contains a categorical variable.

Logical vector

A true entry means that the corresponding column of predictor data (X or Tbl) is a categorical variable.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.

'all'

All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fitcnb assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fitcnb assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' name-value pair argument. For the identified categorical predictors, fitcnb uses multivariate multinomial distributions. For details, see DistributionNames and “Algorithms” on page 32-1657. Example: 'CategoricalPredictors','all' 32-1648

fitcnb

Data Types: single | double | logical | char | string | cell ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as the comma-separated pair consisting of 'ClassNames' and a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. ClassNames must have the same data type as Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use 'ClassNames' to: • Order the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use 'ClassNames' to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To train the model using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}. The default value for ClassNames is the set of all distinct class names in Y. Example: 'ClassNames',{'b','g'} Data Types: categorical | char | string | logical | single | double | cell Cost — Cost of misclassification square matrix | structure Cost of misclassification of a point, specified as the comma-separated pair consisting of 'Cost' and one of the following: • Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). To specify the class order for the corresponding rows and columns of Cost, additionally specify the ClassNames name-value pair argument. • Structure S having two fields: S.ClassNames containing the group names as a variable of the same type as Y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. Example: 'Cost',struct('ClassNames',{{'b','g'}},'ClassificationCosts',[0 0.5; 1 0]) Data Types: single | double | struct PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on the way you supply the training data. 32-1649

32

Functions

• If you supply X and Y, then you can use 'PredictorNames' to give the predictor variables in X names. • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'x1','x2',...}. • If you supply Tbl, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitcnb uses only the predictor variables in PredictorNames and the response variable in training. • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable. • By default, PredictorNames contains the names of all predictor variables. • A good practice is to specify the predictors for training using either 'PredictorNames' or formula only. Example: 'PredictorNames', {'SepalLength','SepalWidth','PetalLength','PetalWidth'} Data Types: string | cell Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and a value in this table. Value

Description

'empirical'

The class prior probabilities are the class relative frequencies in Y.

'uniform'

All class prior probabilities are equal to 1/K, where K is the number of classes.

numeric vector

Each element is a class prior probability. Order the elements according to Mdl.ClassNames or specify the order using the ClassNames namevalue pair argument. The software normalizes the elements such that they sum to 1.

structure

A structure S with two fields: • S.ClassNames contains the class names as a variable of the same type as Y. • S.ClassProbs contains a vector of corresponding prior probabilities. The software normalizes the elements such that they sum to 1.

If you set values for both Weights and Prior, the weights are renormalized to add up to the value of the prior probability in the respective class. Example: 'Prior','uniform' 32-1650

fitcnb

Data Types: char | string | single | double | struct ResponseName — Response variable name 'Y' (default) | character vector | string scalar Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector or string scalar. • If you supply Y, then you can use 'ResponseName' to specify a name for the response variable. • If you supply ResponseVarName or formula, then you cannot use 'ResponseName'. Example: 'ResponseName','response' Data Types: char | string ScoreTransform — Score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ... Score transformation, specified as the comma-separated pair consisting of 'ScoreTransform' and a character vector, string scalar, or function handle. This table summarizes the available character vectors and string scalars. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: 'ScoreTransform','logit' Data Types: char | string | function_handle Weights — Observation weights numeric vector of positive values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values or name of a variable in Tbl. The software weighs the observations in each 32-1651

32

Functions

row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows of X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response when training the model. The software normalizes Weights to sum up to the value of the prior probability in the respective class. By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. Data Types: double | single | char | string Hyperparameter Optimization

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: • 'none' — Do not optimize. • 'auto' — Use {'DistributionNames','Width'}. • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names. • Vector of optimizableVariable objects, typically the output of hyperparameters. The optimization attempts to minimize the cross-validation loss (error) for fitcnb by varying the parameters. For information about cross-validation loss (albeit in a different context), see “Classification Loss” on page 32-2756. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note 'OptimizeHyperparameters' values override any values you set using other name-value pair arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes the 'auto' values to apply. The eligible parameters for fitcnb are: • DistributionNames — fitcnb searches among 'normal' and 'kernel'. • Width — fitcnb searches among real values, by default log-scaled in the range [MinPredictorDiff/4,max(MaxPredictorRange,MinPredictorDiff)]. • Kernel — fitcnb searches among 'normal', 'box', 'epanechnikov', and 'triangle'. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example,

32-1652

fitcnb

load fisheriris params = hyperparameters('fitcnb',meas,species); params(2).Range = [1e-2,1e2];

Pass params as the value of OptimizeHyperparameters. By default, iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is log(1 + cross-validation loss) for regression and the misclassification rate for classification. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value pair argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value pair argument. For an example, see “Optimize Naive Bayes Classifier” on page 32-1638. Example: 'auto' HyperparameterOptimizationOptions — Options for optimization structure Options for optimization, specified as the comma-separated pair consisting of 'HyperparameterOptimizationOptions' and a structure. This argument modifies the effect of the OptimizeHyperparameters name-value pair argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults).

32-1653

32

Functions

Field Name

Values

AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

Default 'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' or 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real. The time limit is in seconds, as measured by tic and toc. Run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables. ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best objective function value against the iteration number. If there are one or two optimization parameters, and if Optimizer is 'bayesopt', then ShowPlots also plots a model of the objective function against the parameters.

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

32-1654

false

fitcnb

Field Name

Values

Default

Verbose

Display to the command line.

1

• 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue pair argument. UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If false, the optimizer uses a single partition for the optimization. true usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three field names. CVPartition

A cvpartition object, as created by cvpartition.

Holdout

A scalar in the range (0,1) representing the holdout fraction.

Kfold

An integer greater than 1.

'Kfold',5 if you do not specify any crossvalidation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments Mdl — Trained naive Bayes classification model ClassificationNaiveBayes model object | ClassificationPartitionedModel cross-validated model object Trained naive Bayes classification model, returned as a ClassificationNaiveBayes model object or a ClassificationPartitionedModel cross-validated model object. If you set any of the name-value pair arguments KFold, Holdout, CrossVal, or CVPartition, then Mdl is a ClassificationPartitionedModel cross-validated model object. Otherwise, Mdl is a ClassificationNaiveBayes model object. 32-1655

32

Functions

To reference properties of Mdl, use dot notation. For example, to access the estimated distribution parameters, enter Mdl.DistributionParameters.

More About Bag-of-Tokens Model In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in this observation. The number of categories (bins) in this multinomial model is the number of distinct tokens, that is, the number of predictors. Naive Bayes Naive Bayes is a classification algorithm that applies density estimation to the data. The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Though the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1]. Naive Bayes classifiers assign observations to the most probable class (in other words, the maximum a posteriori decision rule). Explicitly, the algorithm: 1

Estimates the densities of the predictors within each class.

2

Models posterior probabilities according to Bayes rule. That is, for all k = 1,...,K, π Y=k P Y = k X1, .., XP =



P



j=1 K

k=1

π Y=k

P Xj Y = k P



j=1

, P Xj Y = k

where: • Y is the random variable corresponding to the class index of an observation. • X1,...,XP are the random predictors of an observation. • π Y = k is the prior probability that a class index is k. 3

Classifies an observation by estimating the posterior probability for each class, and then assigns the observation to the class yielding the maximum posterior probability.

If the predictors compose a multinomial distribution, then the posterior probability P Y = k X1, .., XP ∝ π Y = k Pmn X1, ..., XP Y = k , where Pmn X1, ..., XP Y = k is the probability mass function of a multinomial distribution.

Tips • For classifying count-based data, such as the bag-of-tokens model on page 32-1656, use the multinomial distribution (e.g., set 'DistributionNames','mn'). • After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 31-2. 32-1656

fitcnb

Algorithms • If you specify 'DistributionNames','mn' when training Mdl using fitcnb, then the software fits a multinomial distribution using the bag-of-tokens model on page 32-1656. The software stores the probability that token j appears in class k in the property DistributionParameters{k,j}. Using additive smoothing [2], the estimated probability is P(token  j class k) =

1 + cj k , P + ck

where:



• cj

k

= nk

i: yi ∈ class k



xi jwi

i: yi ∈ class k

wi

; which is the weighted number of occurrences of token j in class k.

• nk is the number of observations in class k. • wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. •

ck =

P



j=1

c j k; which is the total weighted number of occurrences of all tokens in class k.

• If you specify 'DistributionNames','mvmn' when training Mdl using fitcnb, then: 1

For each predictor, the software collects a list of the unique levels, stores the sorted list in CategoricalLevels, and considers each level a bin. Each predictor/class combination is a separate, independent multinomial random variable.

2

For predictor j in class k, the software counts instances of each categorical level using the list stored in CategoricalLevels{j}.

3

The software stores the probability that predictor j, in class k, has level L in the property DistributionParameters{k,j}, for all levels in CategoricalLevels{j}. Using additive smoothing [2], the estimated probability is P predictor  j = L class k =

1 + m j k(L) , m j + mk

where:



• m j k(L) = nk

i: yi ∈  class k

I xi j = L wi



i: yi ∈  class k

wi

; which is the weighted number of observations for

which predictor j equals L in class k. • nk is the number of observations in class k. • I xi j = L = 1 if xij = L, 0 otherwise. • wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. • mj is the number of distinct levels in predictor j. • mk is the weighted number of observations in class k. 32-1657

32

Functions

References [1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008. [2] Manning, C. D., P. Raghavan, and M. Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function supports tall arrays with the limitations: • Supported syntaxes are: • Mdl = fitcnb(Tbl,Y) • Mdl = fitcnb(X,Y) • Mdl = fitcnb(___,Name,Value) • Options related to kernel densities, cross-validation, and hyperparameter optimization are not supported. The supported name-value pair arguments are: • 'DistributionNames' — 'kernel' value is not supported. • 'CategoricalPredictors' • 'Cost' • 'PredictorNames' • 'Prior' • 'ResponseName' • 'ScoreTransform' • 'Weights' — Value must be a tall array. For more information, see “Tall Arrays for Out-of-Memory Data” (MATLAB). Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. To perform parallel hyperparameter optimization, use the 'HyperparameterOptions', struct('UseParallel',true) name-value pair argument in the call to this function. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

32-1658

fitcnb

See Also ClassificationNaiveBayes | ClassificationPartitionedModel | predict | templateNaiveBayes Topics “Naive Bayes Classification” on page 21-2 “Grouping Variables” on page 2-45 Introduced in R2014b

32-1659

32

Functions

fitcsvm Train support vector machine (SVM) classifier for one-class and binary classification

Syntax Mdl = fitcsvm(Tbl,ResponseVarName) Mdl = fitcsvm(Tbl,formula) Mdl = fitcsvm(Tbl,Y) Mdl = fitcsvm(X,Y) Mdl = fitcsvm( ___ ,Name,Value)

Description fitcsvm trains or cross-validates a support vector machine (SVM) model for one-class and two-class (binary) classification on a low-dimensional or moderate-dimensional predictor data set. fitcsvm supports mapping the predictor data using kernel functions, and supports sequential minimal optimization (SMO), iterative single data algorithm (ISDA), or L1 soft-margin minimization via quadratic programming for objective-function minimization. To train a linear SVM model for binary classification on a high-dimensional data set, that is, a data set that includes many predictor variables, use fitclinear instead. For multiclass learning with combined binary SVM models, use error-correcting output codes (ECOC). For more details, see fitcecoc. To train an SVM regression model, see fitrsvm for low-dimensional and moderate-dimensional predictor data sets, or fitrlinear for high-dimensional data sets. Mdl = fitcsvm(Tbl,ResponseVarName) returns a support vector machine (SVM) classifier on page 32-1690 Mdl trained using the sample data contained in the table Tbl. ResponseVarName is the name of the variable in Tbl that contains the class labels for one-class or two-class classification. Mdl = fitcsvm(Tbl,formula) returns an SVM classifier trained using the sample data contained in the table Tbl. formula is an explanatory model of the response and a subset of the predictor variables in Tbl used to fit Mdl. Mdl = fitcsvm(Tbl,Y) returns an SVM classifier trained using the predictor variables in the table Tbl and the class labels in vector Y. Mdl = fitcsvm(X,Y) returns an SVM classifier trained using the predictors in the matrix X and the class labels in vector Y for one-class or two-class classification. Mdl = fitcsvm( ___ ,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, you can specify the type of cross-validation, the cost for misclassification, and the type of score transformation function.

Examples

32-1660

fitcsvm

Train SVM Classifier Load Fisher's iris data set. Remove the sepal lengths and widths and all observed setosa irises. load fisheriris inds = ~strcmp(species,'setosa'); X = meas(inds,3:4); y = species(inds);

Train an SVM classifier using the processed data set. SVMModel = fitcsvm(X,y) SVMModel = ClassificationSVM ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations: Alpha: Bias: KernelParameters: BoxConstraints: ConvergenceInfo: IsSupportVector: Solver:

'Y' [] {'versicolor' 'virginica'} 'none' 100 [24x1 double] -14.4149 [1x1 struct] [100x1 double] [1x1 struct] [100x1 logical] 'SMO'

Properties, Methods

SVMModel is a trained ClassificationSVM classifier. Display the properties of SVMModel. For example, to determine the class order, use dot notation. classOrder = SVMModel.ClassNames classOrder = 2x1 cell {'versicolor'} {'virginica' }

The first class ('versicolor') is the negative class, and the second ('virginica') is the positive class. You can change the class order during training by using the 'ClassNames' name-value pair argument. Plot a scatter diagram of the data and circle the support vectors. sv = SVMModel.SupportVectors; figure gscatter(X(:,1),X(:,2),y) hold on plot(sv(:,1),sv(:,2),'ko','MarkerSize',10) legend('versicolor','virginica','Support Vector') hold off

32-1661

32

Functions

The support vectors are observations that occur on or beyond their estimated class boundaries. You can adjust the boundaries (and, therefore, the number of support vectors) by setting a box constraint during training using the 'BoxConstraint' name-value pair argument.

Train and Cross-Validate SVM Classifier Load the ionosphere data set. load ionosphere rng(1); % For reproducibility

Train an SVM classifier using the radial basis kernel. Let the software find a scale value for the kernel function. Standardize the predictors. SVMModel = fitcsvm(X,Y,'Standardize',true,'KernelFunction','RBF',... 'KernelScale','auto');

SVMModel is a trained ClassificationSVM classifier. Cross-validate the SVM classifier. By default, the software uses 10-fold cross-validation. CVSVMModel = crossval(SVMModel);

CVSVMModel is a ClassificationPartitionedModel cross-validated classifier. 32-1662

fitcsvm

Estimate the out-of-sample misclassification rate. classLoss = kfoldLoss(CVSVMModel) classLoss = 0.0484

The generalization rate is approximately 5%.

Detect Outliers Using SVM and One-Class Learning Modify Fisher's iris data set by assigning all the irises to the same class. Detect outliers in the modified data set, and confirm the expected proportion of the observations that are outliers. Load Fisher's iris data set. Remove the petal lengths and widths. Treat all irises as coming from the same class. load fisheriris X = meas(:,1:2); y = ones(size(X,1),1);

Train an SVM classifier using the modified data set. Assume that 5% of the observations are outliers. Standardize the predictors. rng(1); SVMModel = fitcsvm(X,y,'KernelScale','auto','Standardize',true,... 'OutlierFraction',0.05);

SVMModel is a trained ClassificationSVM classifier. By default, the software uses the Gaussian kernel for one-class learning. Plot the observations and the decision boundary. Flag the support vectors and potential outliers. svInd = SVMModel.IsSupportVector; h = 0.02; % Mesh grid step size [X1,X2] = meshgrid(min(X(:,1)):h:max(X(:,1)),... min(X(:,2)):h:max(X(:,2))); [~,score] = predict(SVMModel,[X1(:),X2(:)]); scoreGrid = reshape(score,size(X1,1),size(X2,2)); figure plot(X(:,1),X(:,2),'k.') hold on plot(X(svInd,1),X(svInd,2),'ro','MarkerSize',10) contour(X1,X2,scoreGrid) colorbar; title('{\bf Iris Outlier Detection via One-Class SVM}') xlabel('Sepal Length (cm)') ylabel('Sepal Width (cm)') legend('Observation','Support Vector') hold off

32-1663

32

Functions

The boundary separating the outliers from the rest of the data occurs where the contour value is 0. Verify that the fraction of observations with negative scores in the cross-validated data is close to 5%. CVSVMModel = crossval(SVMModel); [~,scorePred] = kfoldPredict(CVSVMModel); outlierRate = mean(scorePred 10 % Class labels Y = M×1 tall logical array 1 0 1 1 0 1 0 0 : :

Create a tall array for the predictor data. X = tt{:,1:end-1} % Predictor data X = M×6 tall double matrix 10 10 10 10 10 10 10 10 : :

21 26 23 23 22 28 8 10 : :

3 1 5 5 4 3 4 6 : :

642 1021 2055 1332 629 1446 928 859 : :

8 8 21 13 4 59 3 11 : :

308 296 480 296 373 308 447 954 : :

Remove rows in X and Y that contain missing data. 32-1709

32

Functions

R = rmmissing([X Y]); % Data with missing entries removed X = R(:,1:end-1); Y = R(:,end);

Standardize the predictor variables. Z = zscore(X);

Optimize hyperparameters automatically using the 'OptimizeHyperparameters' name-value pair argument. Find the optimal 'MinLeafSize' value that minimizes holdout cross-validation loss. (Specifying 'auto' uses 'MinLeafSize'.) For reproducibility, use the 'expected-improvementplus' acquisition function and set the seeds of the random number generators using rng and tallrng. The results can vary depending on the number of workers and the execution environment for the tall arrays. For details, see “Control Where Your Code Runs” (MATLAB). rng('default') tallrng('default') [Mdl,FitInfo,HyperparameterOptimizationResults] = fitctree(Z,Y,... 'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('Holdout',0.3,... 'AcquisitionFunctionName','expected-improvement-plus')) Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 3: Completed in 0.66 sec - Pass 2 of 3: Completed in 1.2 sec - Pass 3 of 3: Completed in 0.45 sec Evaluation completed in 2.9 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: 0% complete Evaluation 0% complete - Pass 1 of 1: Completed in 0.34 sec Evaluation completed in 0.5 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 1 sec Evaluation completed in 1.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.74 sec - Pass 4 of 4: Completed in 0.96 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.7 sec

32-1710

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 3.1 sec Evaluation completed in 5.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.64 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 4 sec Evaluation completed in 6.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.72 sec - Pass 2 of 4: Completed in 0.93 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 4.9 sec Evaluation completed in 7.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.77 sec - Pass 2 of 4: Completed in 1 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 5.2 sec Evaluation completed in 8.3 sec Evaluating tall expression using the

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1711

32

Functions

- Pass 1 of 4: Completed in 0.86 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 5.5 sec Evaluation completed in 8.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.96 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.67 sec - Pass 4 of 4: Completed in 5.4 sec Evaluation completed in 9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.99 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.62 sec - Pass 4 of 4: Completed in 4.8 sec Evaluation completed in 8.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.63 sec - Pass 4 of 4: Completed in 4.1 sec Evaluation completed in 7.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.63 sec - Pass 4 of 4: Completed in 3.1 sec Evaluation completed in 6.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 2.6 sec Evaluation completed in 6.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.63 sec - Pass 4 of 4: Completed in 2.3 sec Evaluation completed in 5.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 5.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 5.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.64 sec - Pass 4 of 4: Completed in 1.5 sec

32-1712

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluation completed in 5.2 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.67 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 5.1 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.63 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 5 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 5 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.99 sec Evaluation completed in 1.1 sec |======================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | MinLeafSize | | | result | | runtime | (observed) | (estim.) | | |======================================================================================| | 1 | Best | 0.11539 | 191.71 | 0.11539 | 0.11539 | 10 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.46 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.93 sec Evaluation completed in 1.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.75 sec Evaluation completed in 0.87 sec | 2 | Accept | 0.19635 | Evaluating tall expression using - Pass 1 of 1: Completed in 0.32 Evaluation completed in 0.45 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.85 Evaluation completed in 1 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.48 - Pass 2 of 4: Completed in 0.75 - Pass 3 of 4: Completed in 0.51 - Pass 4 of 4: Completed in 0.76 Evaluation completed in 3 sec Evaluating tall expression using

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local': 9.4175 |

0.11539 |

0.11977 |

48298 |

the Parallel Pool 'local': sec the Parallel Pool 'local': sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local':

32-1713

32

Functions

- Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.84 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.84 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.73 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.72 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.75 sec Evaluation completed in 0.86 sec | 3 | Best | 0.1048 |

32-1714

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 50.985 |

0.1048 |

0.11431 |

3166 |

fitctree

Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.44 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.97 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.75 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.49 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.97 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.56 sec

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1715

32

Functions

- Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.64 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.94 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.94 sec | 4 | Best | 0.101 |

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 87.695 |

0.101 |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec

32-1716

0.10585 |

180 |

fitctree

Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.97 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.46 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.74 sec Evaluation completed in 2.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.84 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.7 sec Evaluating tall expression using the

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1717

32

Functions

- Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.76 sec Evaluation completed in 0.88 sec | 5 | Best | 0.10087 |

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 82.435 |

0.10087 |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 0.99 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec

32-1718

0.10085 |

219 |

fitctree

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.51 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1719

32

Functions

- Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 0.74 sec Evaluation completed in 3 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.74 sec Evaluation completed in 0.86 sec | 6 | Accept | 0.10155 | 59.189 | 0.10087 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.31 sec Evaluation completed in 0.44 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.75 sec Evaluation completed in 2.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.76 sec

32-1720

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

0.10089 |

1086 |

fitctree

- Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 2.9 sec Evaluation completed in 5.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.68 sec - Pass 2 of 4: Completed in 0.9 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 3.8 sec Evaluation completed in 6.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.76 sec - Pass 2 of 4: Completed in 0.94 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 4.6 sec Evaluation completed in 7.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.85 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 8.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.95 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 5.5 sec Evaluation completed in 8.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 5.7 sec Evaluation completed in 9.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 5.2 sec Evaluation completed in 9.1 sec Evaluating tall expression using the

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1721

32

Functions

- Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 0.66 sec - Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 9.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.76 sec - Pass 4 of 4: Completed in 4.3 sec Evaluation completed in 8.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.74 sec - Pass 4 of 4: Completed in 3.8 sec Evaluation completed in 8.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.73 sec - Pass 4 of 4: Completed in 3.5 sec Evaluation completed in 7.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 2.3 sec - Pass 3 of 4: Completed in 0.78 sec - Pass 4 of 4: Completed in 2.9 sec Evaluation completed in 8.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 2.5 sec Evaluation completed in 7.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.74 sec - Pass 4 of 4: Completed in 2.3 sec Evaluation completed in 6.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.71 sec - Pass 4 of 4: Completed in 2.1 sec Evaluation completed in 6.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.74 sec - Pass 4 of 4: Completed in 2.5 sec Evaluation completed in 7.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 2.3 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 2.4 sec

32-1722

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluation completed in 7.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.74 sec - Pass 4 of 4: Completed in 2.4 sec Evaluation completed in 7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.75 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6.4 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.8 sec Evaluation completed in 0.91 sec | 7 | Accept | 0.13387 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.98 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.52 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 225.25 |

0.10087 |

0.10089 |

1 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1723

32

Functions

- Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.98 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.95 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 2 sec Evaluation completed in 5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.77 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.83 sec - Pass 4 of 4: Completed in 3.1 sec Evaluation completed in 6.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.71 sec - Pass 2 of 4: Completed in 0.97 sec - Pass 3 of 4: Completed in 0.73 sec - Pass 4 of 4: Completed in 2.5 sec Evaluation completed in 5.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.73 sec - Pass 2 of 4: Completed in 0.87 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 2.4 sec Evaluation completed in 5.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.67 sec - Pass 2 of 4: Completed in 0.88 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 2 sec Evaluation completed in 4.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.67 sec - Pass 2 of 4: Completed in 0.9 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.75 sec

32-1724

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 2 of 4: Completed in 0.89 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.7 sec - Pass 2 of 4: Completed in 0.88 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.71 sec - Pass 2 of 4: Completed in 0.89 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.94 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.72 sec - Pass 2 of 4: Completed in 0.9 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.92 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.68 sec - Pass 2 of 4: Completed in 0.91 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.75 sec Evaluation completed in 0.87 sec | 8 | Accept | 0.10213 | Evaluating tall expression using - Pass 1 of 1: Completed in 0.34 Evaluation completed in 0.47 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.84 Evaluation completed in 0.98 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.51 - Pass 2 of 4: Completed in 0.71 - Pass 3 of 4: Completed in 0.54 - Pass 4 of 4: Completed in 0.77 Evaluation completed in 3.1 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.52 - Pass 2 of 4: Completed in 0.71 - Pass 3 of 4: Completed in 0.55 - Pass 4 of 4: Completed in 0.81 Evaluation completed in 3.1 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.52 - Pass 2 of 4: Completed in 0.73 - Pass 3 of 4: Completed in 0.55 - Pass 4 of 4: Completed in 0.83 Evaluation completed in 3.2 sec Evaluating tall expression using

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 117.51 |

0.10087 |

0.10092 |

56 |

the Parallel Pool 'local': sec the Parallel Pool 'local': sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local':

32-1725

32

Functions

- Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.96 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.92 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.85 sec

32-1726

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluation completed in 3.3 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.2 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.73 sec Evaluation completed in 0.85 sec | 9 | Accept | 0.1017 | 71.087 | 0.10087 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.99 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.74 sec Evaluation completed in 2.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.53 sec

0.10089 |

418 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1727

32

Functions

- Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.64 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.92 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec

32-1728

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.4 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.4 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.74 sec Evaluation completed in 0.86 sec | 10 | Best | 0.10042 | 94.812 | 0.10042 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 0.99 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the

0.10051 |

113 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1729

32

Functions

- Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.89 sec

32-1730

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.75 sec Evaluation completed in 0.87 sec | 11 | Accept | 0.10078 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.31 sec Evaluation completed in 0.44 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.98 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.55 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 95.244 |

0.10042 |

0.10063 |

113 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1731

32

Functions

- Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.92 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec

32-1732

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.72 sec Evaluation completed in 0.84 sec | 12 | Accept | 0.10114 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.97 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.84 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.64 sec - Pass 4 of 4: Completed in 0.92 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.2 sec Evaluating tall expression using the

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 94.301 |

0.10042 |

0.10077 |

120 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1733

32

Functions

- Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.95 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.98 sec

32-1734

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.69 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.68 sec - Pass 2 of 4: Completed in 0.87 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 0.99 sec | 13 | Accept | 0.10141 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.31 sec Evaluation completed in 0.44 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.98 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.52 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 92.514 |

0.10042 |

0.10091 |

127 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1735

32

Functions

- Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.9 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec

32-1736

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.97 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.74 sec Evaluation completed in 0.85 sec | 14 | Accept | 0.10053 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 0.99 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.75 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.2 sec Evaluating tall expression using the

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 95.793 |

0.10042 |

0.10083 |

105 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1737

32

Functions

- Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.9 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.9 sec Evaluation completed in 4.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.66 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1 sec

32-1738

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.9 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 0.98 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.77 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.97 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.73 sec Evaluation completed in 0.84 sec | 15 | Accept | 0.10067 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.33 sec Evaluation completed in 0.46 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 0.99 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 97.948 |

0.10042 |

0.1008 |

99 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1739

32

Functions

- Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 2 sec Evaluation completed in 4.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.67 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec

32-1740

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.9 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.97 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.97 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.92 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.67 sec - Pass 2 of 4: Completed in 0.89 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.72 sec Evaluation completed in 0.83 sec | 16 | Accept | 0.10097 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.85 sec Evaluation completed in 0.99 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.2 sec Evaluating tall expression using the

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 101.84 |

0.10042 |

0.10082 |

95 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1741

32

Functions

- Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.75 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.75 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.95 sec | 17 | Accept | 0.11126 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.31 sec Evaluation completed in 0.44 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.2 sec

32-1742

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 32.546 |

0.10042 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

0.10075 |

11100 |

fitctree

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.76 sec Evaluation completed in 0.88 sec | 18 | Accept | 0.11126 |

Parallel Pool 'local':

Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.97 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.84 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.84 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 36.874 |

0.10042 |

0.10063 |

5960 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1743

32

Functions

Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.75 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.75 sec Evaluation completed in 0.86 sec | 19 | Accept | 0.10164 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.31 sec Evaluation completed in 0.44 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.63 sec

32-1744

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 55.691 |

0.10042 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

0.10068 |

1774 |

fitctree

- Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 2.1 sec Evaluation completed in 4.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 2.7 sec Evaluation completed in 5.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 3.2 sec Evaluation completed in 5.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.66 sec - Pass 2 of 4: Completed in 0.88 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 3.3 sec Evaluation completed in 6.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.69 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1745

32

Functions

- Pass 2 of 4: Completed in 0.93 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 3.6 sec Evaluation completed in 6.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.77 sec - Pass 2 of 4: Completed in 1 sec - Pass 3 of 4: Completed in 0.63 sec - Pass 4 of 4: Completed in 3.5 sec Evaluation completed in 6.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.83 sec - Pass 2 of 4: Completed in 1 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 3.1 sec Evaluation completed in 6.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.82 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 2.3 sec Evaluation completed in 5.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.93 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.62 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.84 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 4.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.88 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 4.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.8 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 4.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.83 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 4.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.84 sec - Pass 2 of 4: Completed in 1 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 4.1 sec

32-1746

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.79 sec Evaluation completed in 0.9 sec | 20 | Accept | 0.10613 | 138.45 | 0.10042 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.43 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.9 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.95 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.82 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.95 sec

0.1007 |

27 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1747

32

Functions

Evaluation completed in 3.5 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.97 sec Evaluation completed in 3.4 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.3 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.9 sec Evaluation completed in 3.4 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.6 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.87 sec Evaluation completed in 3.4 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3.2 sec Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.77 sec Evaluation completed in 0.9 sec |======================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | MinLeafSize | | | result | | runtime | (observed) | (estim.) | | |======================================================================================| | 21 | Accept | 0.10154 | 66.581 | 0.10042 | 0.10072 | 687 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.47 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.89 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.1 sec

32-1748

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.99 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 1.1 sec - Pass 4 of 4: Completed in 0.99 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.6 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1749

32

Functions

- Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.75 sec Evaluation completed in 0.86 sec | 22 | Accept | 0.10099 | Evaluating tall expression using - Pass 1 of 1: Completed in 0.31 Evaluation completed in 0.44 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.85 Evaluation completed in 0.99 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.48 - Pass 2 of 4: Completed in 0.69 - Pass 3 of 4: Completed in 0.53 - Pass 4 of 4: Completed in 0.75 Evaluation completed in 3 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.48 - Pass 2 of 4: Completed in 0.71 - Pass 3 of 4: Completed in 0.53 - Pass 4 of 4: Completed in 0.77 Evaluation completed in 3 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.48 - Pass 2 of 4: Completed in 0.71 - Pass 3 of 4: Completed in 0.54 - Pass 4 of 4: Completed in 0.84 Evaluation completed in 3.1 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.52 - Pass 2 of 4: Completed in 0.74

32-1750

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 83.689 |

0.10042 |

the Parallel Pool 'local': sec the Parallel Pool 'local': sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec

0.10072 |

271 |

fitctree

- Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.84 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 0.98 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.9 sec Evaluation completed in 4.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.81 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.9 sec Evaluation completed in 4.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 4.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.3 sec Evaluating tall expression using the

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1751

32

Functions

- Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.63 sec - Pass 2 of 4: Completed in 0.92 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.67 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.91 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.64 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.74 sec Evaluation completed in 0.85 sec | 23 | Accept | 0.10099 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.31 sec Evaluation completed in 0.45 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.86 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.68 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.82 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec

32-1752

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 98.639 |

0.10042 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

0.10073 |

92 |

fitctree

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.75 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 2.3 sec Evaluation completed in 4.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 2.9 sec Evaluation completed in 5.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.71 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 3.8 sec Evaluation completed in 6.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.75 sec - Pass 2 of 4: Completed in 0.98 sec - Pass 3 of 4: Completed in 0.62 sec - Pass 4 of 4: Completed in 4.6 sec Evaluation completed in 7.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.88 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.59 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1753

32

Functions

- Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 8.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.98 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.67 sec - Pass 4 of 4: Completed in 5.4 sec Evaluation completed in 9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 1.3 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 6.2 sec Evaluation completed in 9.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.66 sec - Pass 4 of 4: Completed in 5.7 sec Evaluation completed in 9.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 9.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.3 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.69 sec - Pass 4 of 4: Completed in 4.1 sec Evaluation completed in 8.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.9 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.71 sec - Pass 4 of 4: Completed in 3.5 sec Evaluation completed in 8.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.9 sec - Pass 3 of 4: Completed in 0.95 sec - Pass 4 of 4: Completed in 3.1 sec Evaluation completed in 8.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 2.3 sec - Pass 3 of 4: Completed in 0.94 sec - Pass 4 of 4: Completed in 2.7 sec Evaluation completed in 8.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.88 sec - Pass 4 of 4: Completed in 2.3 sec Evaluation completed in 6.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec

32-1754

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 2 of 4: Completed in 2.2 sec - Pass 3 of 4: Completed in 0.77 sec - Pass 4 of 4: Completed in 2.6 sec Evaluation completed in 7.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.74 sec - Pass 4 of 4: Completed in 1.9 sec Evaluation completed in 6.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.71 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.71 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 6.6 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.77 sec Evaluation completed in 0.89 sec | 24 | Accept | 0.13208 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.45 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.83 sec Evaluation completed in 0.97 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.54 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.74 sec Evaluation completed in 3 sec Evaluating tall expression using the

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 218.49 |

0.10042 |

0.10074 |

3 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1755

32

Functions

- Pass 1 of 1: Completed in 0.76 sec Evaluation completed in 0.88 sec | 25 | Accept | 0.13632 | 18.334 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.46 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.88 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.75 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.86 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.96 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 3.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 3.9 sec

32-1756

0.10042 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

0.10076 |

22706 |

fitctree

Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.85 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 3 sec Evaluation completed in 5.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.7 sec - Pass 2 of 4: Completed in 0.92 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 3.9 sec Evaluation completed in 6.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.74 sec - Pass 2 of 4: Completed in 0.94 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 4.7 sec Evaluation completed in 7.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.86 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 8.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.9 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.64 sec - Pass 4 of 4: Completed in 5.3 sec Evaluation completed in 8.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.63 sec - Pass 4 of 4: Completed in 5.4 sec Evaluation completed in 9.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.2 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 4.5 sec Evaluation completed in 8.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.3 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 0.7 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1757

32

Functions

- Pass 4 of 4: Completed in 3.6 sec Evaluation completed in 7.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.3 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 3 sec Evaluation completed in 7.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.3 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 0.7 sec - Pass 4 of 4: Completed in 2.8 sec Evaluation completed in 6.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.3 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 2.3 sec Evaluation completed in 6.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.3 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.7 sec - Pass 4 of 4: Completed in 2 sec Evaluation completed in 6.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.68 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.67 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 5.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.71 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 5.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.69 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 5.8 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.77 sec Evaluation completed in 0.89 sec | 26 | Accept | 0.12471 |

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 198.02 |

0.10042 |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.3 sec Evaluation completed in 0.42 sec

32-1758

0.10077 |

6 |

fitctree

Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.98 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.75 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.83 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.88 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1 sec Evaluation completed in 3.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.7 sec Evaluating tall expression using the

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1759

32

Functions

- Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 3 sec Evaluation completed in 5.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.64 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 3.7 sec Evaluation completed in 6.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.7 sec - Pass 2 of 4: Completed in 0.96 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 4.1 sec Evaluation completed in 7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.74 sec - Pass 2 of 4: Completed in 0.96 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 4.3 sec Evaluation completed in 7.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.8 sec - Pass 2 of 4: Completed in 1 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 4.4 sec Evaluation completed in 7.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.86 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.58 sec - Pass 4 of 4: Completed in 4.2 sec Evaluation completed in 7.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.89 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.62 sec - Pass 4 of 4: Completed in 3.6 sec Evaluation completed in 6.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.92 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.62 sec - Pass 4 of 4: Completed in 2.7 sec Evaluation completed in 6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.95 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 5.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.95 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.62 sec - Pass 4 of 4: Completed in 1.8 sec

32-1760

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

Evaluation completed in 5.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.98 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.97 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.71 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 4.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.96 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.62 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 4.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.98 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.67 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 4.7 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.79 sec Evaluation completed in 0.91 sec | 27 | Accept | 0.10686 | Evaluating tall expression using the - Pass 1 of 1: Completed in 0.32 sec Evaluation completed in 0.46 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.88 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.79 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.53 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 154.29 |

0.10042 |

0.10076 |

17 |

Parallel Pool 'local': Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1761

32

Functions

- Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.89 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.99 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec - Pass 2 of 4: Completed in 0.8 sec - Pass 3 of 4: Completed in 0.62 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.62 sec - Pass 2 of 4: Completed in 0.84 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 2.9 sec Evaluation completed in 5.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.68 sec - Pass 2 of 4: Completed in 0.87 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 3.8 sec Evaluation completed in 6.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.75 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 4.5 sec Evaluation completed in 7.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.89 sec - Pass 2 of 4: Completed in 1.1 sec - Pass 3 of 4: Completed in 0.61 sec - Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 8.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.93 sec

32-1762

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

fitctree

- Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.67 sec - Pass 4 of 4: Completed in 5.3 sec Evaluation completed in 9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1 sec - Pass 2 of 4: Completed in 1.2 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 5.5 sec Evaluation completed in 9.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.1 sec - Pass 2 of 4: Completed in 1.4 sec - Pass 3 of 4: Completed in 0.65 sec - Pass 4 of 4: Completed in 5.1 sec Evaluation completed in 9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.3 sec - Pass 2 of 4: Completed in 1.5 sec - Pass 3 of 4: Completed in 0.69 sec - Pass 4 of 4: Completed in 4.9 sec Evaluation completed in 9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.3 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.69 sec - Pass 4 of 4: Completed in 4.2 sec Evaluation completed in 8.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.7 sec - Pass 4 of 4: Completed in 3.6 sec Evaluation completed in 7.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.6 sec - Pass 3 of 4: Completed in 0.71 sec - Pass 4 of 4: Completed in 3 sec Evaluation completed in 7.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.4 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.71 sec - Pass 4 of 4: Completed in 2.8 sec Evaluation completed in 7.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 2.4 sec Evaluation completed in 6.8 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.73 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 6.7 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1763

32

Functions

32-1764

Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 2 sec Evaluation completed in 6.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.8 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 1.8 sec Evaluation completed in 6.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 6.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.5 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 6.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 1.6 sec - Pass 2 of 4: Completed in 1.7 sec - Pass 3 of 4: Completed in 0.72 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 6.4 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 1.1 sec Evaluation completed in 1.2 sec | 28 | Accept | 0.12954 |

Parallel Pool 'local':

Evaluating tall expression using the - Pass 1 of 1: Completed in 0.31 sec Evaluation completed in 0.44 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.87 sec Evaluation completed in 1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.76 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.78 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 229.56 |

0.10042 |

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

0.10074 |

2 |

fitctree

Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.8 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.85 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.5 sec - Pass 4 of 4: Completed in 0.99 sec Evaluation completed in 3.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.71 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.73 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.59 sec - Pass 2 of 4: Completed in 0.86 sec - Pass 3 of 4: Completed in 0.6 sec - Pass 4 of 4: Completed in 2 sec Evaluation completed in 4.6 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.79 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 2.4 sec Evaluation completed in 4.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.61 sec - Pass 2 of 4: Completed in 0.83 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 2.7 sec Evaluation completed in 5.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.65 sec - Pass 2 of 4: Completed in 0.9 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1765

32

Functions

- Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 2.9 sec Evaluation completed in 5.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.67 sec - Pass 2 of 4: Completed in 0.89 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 2.8 sec Evaluation completed in 5.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.72 sec - Pass 2 of 4: Completed in 0.91 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 2.7 sec Evaluation completed in 5.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.72 sec - Pass 2 of 4: Completed in 0.92 sec - Pass 3 of 4: Completed in 0.56 sec - Pass 4 of 4: Completed in 2.1 sec Evaluation completed in 4.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.72 sec - Pass 2 of 4: Completed in 0.94 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 1.6 sec Evaluation completed in 4.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.74 sec - Pass 2 of 4: Completed in 0.94 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 1.3 sec Evaluation completed in 4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.74 sec - Pass 2 of 4: Completed in 0.95 sec - Pass 3 of 4: Completed in 0.59 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.76 sec - Pass 2 of 4: Completed in 0.96 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.74 sec - Pass 2 of 4: Completed in 0.96 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.99 sec Evaluation completed in 3.8 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.75 sec Evaluation completed in 0.87 sec | 29 | Accept | 0.10356 |

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local': 116.88 |

0.10042 |

Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 0.3 sec

32-1766

0.10072 |

39 |

fitctree

Evaluation completed in 0.45 sec Evaluating tall expression using the - Pass 1 of 1: Completed in 0.84 sec Evaluation completed in 0.98 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.55 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.54 sec - Pass 4 of 4: Completed in 0.77 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.47 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.52 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.57 sec - Pass 4 of 4: Completed in 0.81 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.49 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.51 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.5 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.51 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.72 sec - Pass 3 of 4: Completed in 0.53 sec - Pass 4 of 4: Completed in 0.78 sec Evaluation completed in 3.2 sec

Parallel Pool 'local': Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1767

32

Functions

Evaluating tall expression using - Pass 1 of 4: Completed in 0.49 - Pass 2 of 4: Completed in 0.71 - Pass 3 of 4: Completed in 0.52 - Pass 4 of 4: Completed in 0.74 Evaluation completed in 3 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.73 Evaluation completed in 0.84 sec | 30 | Accept | 0.10404 |

32-1768

the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec 49.982 |

0.10042 |

0.10068 |

2359 |

fitctree

__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 3083.3701 seconds. Total objective function evaluation time: 3065.7523 Best observed feasible point: MinLeafSize ___________ 113 Observed objective function value = 0.10042

32-1769

32

Functions

Estimated objective function value = 0.10078 Function evaluation time = 94.8122 Best estimated feasible point (according to models): MinLeafSize ___________ 105 Estimated objective function value = 0.10068 Estimated function evaluation time = 97.1063 Evaluating tall expression using - Pass 1 of 1: Completed in 0.25 Evaluation completed in 0.38 sec Evaluating tall expression using - Pass 1 of 1: Completed in 0.83 Evaluation completed in 0.96 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.34 - Pass 2 of 4: Completed in 0.55 - Pass 3 of 4: Completed in 0.45 - Pass 4 of 4: Completed in 0.65 Evaluation completed in 2.5 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.34 - Pass 2 of 4: Completed in 0.56 - Pass 3 of 4: Completed in 0.43 - Pass 4 of 4: Completed in 0.66 Evaluation completed in 2.5 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.34 - Pass 2 of 4: Completed in 0.56 - Pass 3 of 4: Completed in 0.44 - Pass 4 of 4: Completed in 0.68 Evaluation completed in 2.5 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.35 - Pass 2 of 4: Completed in 0.56 - Pass 3 of 4: Completed in 0.43 - Pass 4 of 4: Completed in 0.75 Evaluation completed in 2.6 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.34 - Pass 2 of 4: Completed in 0.55 - Pass 3 of 4: Completed in 0.43 - Pass 4 of 4: Completed in 0.76 Evaluation completed in 2.6 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.36 - Pass 2 of 4: Completed in 0.59 - Pass 3 of 4: Completed in 0.44 - Pass 4 of 4: Completed in 0.88 Evaluation completed in 2.8 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.37 - Pass 2 of 4: Completed in 0.66 - Pass 3 of 4: Completed in 0.44

32-1770

the Parallel Pool 'local': sec the Parallel Pool 'local': sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec

fitctree

- Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.1 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.39 sec - Pass 2 of 4: Completed in 0.6 sec - Pass 3 of 4: Completed in 0.44 sec - Pass 4 of 4: Completed in 1.4 sec Evaluation completed in 3.3 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.42 sec - Pass 2 of 4: Completed in 0.64 sec - Pass 3 of 4: Completed in 0.45 sec - Pass 4 of 4: Completed in 1.7 sec Evaluation completed in 3.7 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.46 sec - Pass 2 of 4: Completed in 0.69 sec - Pass 3 of 4: Completed in 0.47 sec - Pass 4 of 4: Completed in 2.1 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.48 sec - Pass 2 of 4: Completed in 0.7 sec - Pass 3 of 4: Completed in 0.46 sec - Pass 4 of 4: Completed in 2.2 sec Evaluation completed in 4.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.52 sec - Pass 2 of 4: Completed in 0.74 sec - Pass 3 of 4: Completed in 0.48 sec - Pass 4 of 4: Completed in 2.1 sec Evaluation completed in 4.4 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.53 sec - Pass 2 of 4: Completed in 0.76 sec - Pass 3 of 4: Completed in 0.46 sec - Pass 4 of 4: Completed in 1.9 sec Evaluation completed in 4.2 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.55 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.47 sec - Pass 4 of 4: Completed in 1.5 sec Evaluation completed in 3.9 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.56 sec - Pass 2 of 4: Completed in 0.77 sec - Pass 3 of 4: Completed in 0.46 sec - Pass 4 of 4: Completed in 1.2 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.58 sec - Pass 2 of 4: Completed in 0.78 sec - Pass 3 of 4: Completed in 0.49 sec - Pass 4 of 4: Completed in 1.1 sec Evaluation completed in 3.5 sec Evaluating tall expression using the - Pass 1 of 4: Completed in 0.57 sec

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

Parallel Pool 'local':

32-1771

32

Functions

- Pass 2 of 4: Completed in 0.78 - Pass 3 of 4: Completed in 0.48 - Pass 4 of 4: Completed in 0.91 Evaluation completed in 3.3 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.57 - Pass 2 of 4: Completed in 0.79 - Pass 3 of 4: Completed in 0.48 - Pass 4 of 4: Completed in 0.81 Evaluation completed in 3.2 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.58 - Pass 2 of 4: Completed in 0.77 - Pass 3 of 4: Completed in 0.48 - Pass 4 of 4: Completed in 0.84 Evaluation completed in 3.2 sec Evaluating tall expression using - Pass 1 of 4: Completed in 0.57 - Pass 2 of 4: Completed in 0.78 - Pass 3 of 4: Completed in 0.49 - Pass 4 of 4: Completed in 0.82 Evaluation completed in 3.2 sec

sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec the Parallel Pool 'local': sec sec sec sec

Mdl = classreg.learning.classif.CompactClassificationTree ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods FitInfo = struct with no fields.

HyperparameterOptimizationResults = BayesianOptimization with properties: ObjectiveFcn: VariableDescriptions: Options: MinObjective: XAtMinObjective: MinEstimatedObjective: XAtMinEstimatedObjective: NumObjectiveEvaluations: TotalElapsedTime: NextPoint: XTrace: ObjectiveTrace: ConstraintsTrace: UserDataTrace: ObjectiveEvaluationTimeTrace: IterationTimeTrace: ErrorTrace: FeasibilityTrace:

32-1772

@createObjFcn/tallObjFcn [4×1 optimizableVariable] [1×1 struct] 0.1004 [1×1 table] 0.1007 [1×1 table] 30 3.0834e+03 [1×1 table] [30×1 table] [30×1 double] [] {30×1 cell} [30×1 double] [30×1 double] [30×1 double] [30×1 logical]

fitctree

FeasibilityProbabilityTrace: IndexOfMinimumTrace: ObjectiveMinimumTrace: EstimatedObjectiveMinimumTrace:

[30×1 [30×1 [30×1 [30×1

double] double] double] double]

Input Arguments Tbl — Sample data table Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName. If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify a formula by using formula. If Tbl does not contain the response variable, then specify a response variable by using Y. The length of the response variable and the number of rows in Tbl must be equal. Data Types: table ResponseVarName — Response variable name name of variable in Tbl Response variable name, specified as the name of a variable in Tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as Tbl.Y, then specify it as 'Y'. Otherwise, the software treats all columns of Tbl, including Y, as predictors when training the model. The response variable must be a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. If Y is a character array, then each element of the response variable must correspond to one row of the array. A good practice is to specify the order of the classes by using the ClassNames name-value pair argument. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form 'Y~X1+X2+X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables. To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula. 32-1773

32

Functions

The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,Tbl.Properties.VariableNames)

If the variable names in Tbl are not valid, then convert them by using the matlab.lang.makeValidName function. Tbl.Properties.VariableNames = matlab.lang.makeValidName(Tbl.Properties.VariableNames);

Data Types: char | string Y — Class labels numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors Class labels, specified as a numeric vector, categorical vector, logical vector, character array, string array, or cell array of character vectors. Each row of Y represents the classification of the corresponding row of X. When fitting the tree, fitctree considers NaN, '' (empty character vector), "" (empty string), , and values in Y to be missing values. fitctree does not use observations with missing values for Y in the fit. For numeric Y, consider fitting a regression tree using fitrtree instead. Data Types: single | double | categorical | logical | char | string | cell X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one predictor variable. fitctree considers NaN values in X as missing values. fitctree does not use observations with all missing values for X in the fit. fitctree uses observations with some missing values for X to find splits on variables for which these observations have valid values. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Note You cannot use any cross-validation name-value pair argument along with the 'OptimizeHyperparameters' name-value pair argument. You can modify the cross-validation for 'OptimizeHyperparameters' only by using the 'HyperparameterOptimizationOptions' name-value pair argument.

32-1774

fitctree

Example: 'CrossVal','on','MinLeafSize',40 specifies a cross-validated classification tree with a minimum of 40 observations per leaf. Model Parameters

AlgorithmForCategorical — Algorithm for best categorical predictor split 'Exact' | 'PullLeft' | 'PCA' | 'OVAbyClass' Algorithm to find the best split on a categorical predictor with C categories for data and K ≥ 3 classes, specified as the comma-separated pair consisting of 'AlgorithmForCategorical' and one of the following values. Value

Description

'Exact'

Consider all 2C–1 – 1 combinations.

'PullLeft'

Start with all C categories on the right branch. Consider moving each category to the left branch as it achieves the minimum impurity for the K classes among the remaining categories. From this sequence, choose the split that has the lowest impurity.

'PCA'

Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C – 1 splits.

'OVAbyClass'

Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. From this sequence, choose the split that has the minimum impurity.

fitctree automatically selects the optimal subset of algorithms for each split using the known number of classes and levels of a categorical predictor. For K = 2 classes, fitctree always performs the exact search. To specify a particular algorithm, use the 'AlgorithmForCategorical' namevalue pair argument. For more details, see “Splitting Categorical Predictors in Classification Trees” on page 19-25. Example: 'AlgorithmForCategorical','PCA' CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' Categorical predictors list, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the values in this table.

32-1775

32

Functions

Value

Description

Vector of positive integers

Each entry in the vector is an index value corresponding to the column of the predictor data (X or Tbl) that contains a categorical variable.

Logical vector

A true entry means that the corresponding column of predictor data (X or Tbl) is a categorical variable.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.

'all'

All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fitctree assumes that a variable is categorical if it is a logical vector, unordered categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fitctree assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' name-value pair argument. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as the comma-separated pair consisting of 'ClassNames' and a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. ClassNames must have the same data type as Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use 'ClassNames' to: • Order the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use 'ClassNames' to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To train the model using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}. The default value for ClassNames is the set of all distinct class names in Y. Example: 'ClassNames',{'b','g'} Data Types: categorical | char | string | logical | single | double | cell Cost — Cost of misclassification square matrix | structure

32-1776

fitctree

Cost of misclassification of a point, specified as the comma-separated pair consisting of 'Cost' and one of the following: • Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i (i.e., the rows correspond to the true class and the columns correspond to the predicted class). To specify the class order for the corresponding rows and columns of Cost, also specify the ClassNames name-value pair argument. • Structure S having two fields: S.ClassNames containing the group names as a variable of the same data type as Y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. Data Types: single | double | struct MaxDepth — Maximum tree depth positive integer Maximum tree depth, specified as the comma-separated pair consisting of 'MaxDepth' and a positive integer. Specify a value for this argument to return a tree that has fewer levels and requires fewer passes through the tall array to compute. Generally, the algorithm of fitctree takes one pass through the data and an additional pass for each tree level. The function does not set a maximum tree depth, by default. Note This option applies only when you use fitctree on tall arrays. See Tall Arrays on page 321794 for more information. MaxNumCategories — Maximum category levels 10 (default) | nonnegative scalar value Maximum category levels, specified as the comma-separated pair consisting of 'MaxNumCategories' and a nonnegative scalar value. fitctree splits a categorical predictor using the exact search algorithm if the predictor has at most MaxNumCategories levels in the split node. Otherwise, fitctree finds the best categorical split using one of the inexact algorithms. Passing a small value can lead to loss of accuracy and passing a large value can increase computation time and memory overload. Example: 'MaxNumCategories',8 MaxNumSplits — Maximal number of decision splits size(X,1) - 1 (default) | positive integer Maximal number of decision splits (or branch nodes), specified as the comma-separated pair consisting of 'MaxNumSplits' and a positive integer. fitctree splits MaxNumSplits or fewer branch nodes. For more details on splitting behavior, see Algorithms on page 32-1791. Example: 'MaxNumSplits',5 Data Types: single | double MergeLeaves — Leaf merge flag 'on' (default) | 'off' Leaf merge flag, specified as the comma-separated pair consisting of 'MergeLeaves' and 'on' or 'off'. 32-1777

32

Functions

If MergeLeaves is 'on', then fitctree: • Merges leaves that originate from the same parent node, and that yields a sum of risk values greater or equal to the risk associated with the parent node • Estimates the optimal sequence of pruned subtrees, but does not prune the classification tree Otherwise, fitctree does not merge leaves. Example: 'MergeLeaves','off' MinLeafSize — Minimum number of leaf node observations 1 (default) | positive integer value Minimum number of leaf node observations, specified as the comma-separated pair consisting of 'MinLeafSize' and a positive integer value. Each leaf has at least MinLeafSize observations per tree leaf. If you supply both MinParentSize and MinLeafSize, fitctree uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize). Example: 'MinLeafSize',3 Data Types: single | double MinParentSize — Minimum number of branch node observations 10 (default) | positive integer value Minimum number of branch node observations, specified as the comma-separated pair consisting of 'MinParentSize' and a positive integer value. Each branch node in the tree has at least MinParentSize observations. If you supply both MinParentSize and MinLeafSize, fitctree uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize). Example: 'MinParentSize',8 Data Types: single | double NumBins — Number of bins for numeric predictors [](empty) (default) | positive integer scalar Number of bins for numeric predictors, specified as the comma-separated pair consisting of 'NumBins' and a positive integer scalar. • If the 'NumBins' value is empty (default), then the software does not bin any predictors. • If you specify the 'NumBins' value as a positive integer scalar, then the software bins every numeric predictor into a specified number of equiprobable bins, and then grows trees on the bin indices instead of the original data. • If the 'NumBins' value exceeds the number (u) of unique values for a predictor, then fitctree bins the predictor into u bins. • fitctree does not bin categorical predictors. When you use a large training data set, this binning option speeds up training but causes a potential decrease in accuracy. You can try 'NumBins',50 first, and then change the 'NumBins' value depending on the accuracy and training speed. A trained model stores the bin edges in the BinEdges property. Example: 'NumBins',50 32-1778

fitctree

Data Types: single | double NumVariablesToSample — Number of predictors to select at random for each split 'all' (default) | positive integer value Number of predictors to select at random for each split, specified as the comma-separated pair consisting of 'NumVariablesToSample' and a positive integer value. Alternatively, you can specify 'all' to use all available predictors. If the training data includes many predictors and you want to analyze predictor importance, then specify 'NumVariablesToSample' as 'all'. Otherwise, the software might not select some predictors, underestimating their importance. To reproduce the random selections, you must set the seed of the random number generator by using rng and specify 'Reproducible',true. Example: 'NumVariablesToSample',3 Data Types: char | string | single | double PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on the way you supply the training data. • If you supply X and Y, then you can use 'PredictorNames' to give the predictor variables in X names. • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'x1','x2',...}. • If you supply Tbl, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitctree uses only the predictor variables in PredictorNames and the response variable in training. • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable. • By default, PredictorNames contains the names of all predictor variables. • A good practice is to specify the predictors for training using either 'PredictorNames' or formula only. Example: 'PredictorNames', {'SepalLength','SepalWidth','PetalLength','PetalWidth'} Data Types: string | cell PredictorSelection — Algorithm used to select the best split predictor 'allsplits' (default) | 'curvature' | 'interaction-curvature' Algorithm used to select the best split predictor at each node, specified as the comma-separated pair consisting of 'PredictorSelection' and a value in this table. 32-1779

32

Functions

Value

Description

'allsplits'

Standard CART — Selects the split predictor that maximizes the splitcriterion gain over all possible splits of all predictors [1].

'curvature'

Curvature test on page 32-1788 — Selects the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response [4]. Training speed is similar to standard CART.

'interactioncurvature'

Interaction test on page 32-1790 — Chooses the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response, and that minimizes the p-value of a chi-square test of independence between each pair of predictors and response [3]. Training speed can be slower than standard CART.

For 'curvature' and 'interaction-curvature', if all tests yield p-values greater than 0.05, then fitctree stops splitting nodes. Tip • Standard CART tends to select split predictors containing many distinct values, e.g., continuous variables, over those containing few distinct values, e.g., categorical variables [4]. Consider specifying the curvature or interaction test if any of the following are true: • If there are predictors that have relatively fewer distinct values than other predictors, for example, if the predictor data set is heterogeneous. • If an analysis of predictor importance is your goal. For more on predictor importance estimation, see predictorImportance and “Introduction to Feature Selection” on page 1549. • Trees grown using standard CART are not sensitive to predictor variable interactions. Also, such trees are less likely to identify important variables in the presence of many irrelevant predictors than the application of the interaction test. Therefore, to account for predictor interactions and identify importance variables in the presence of many irrelevant variables, specify the interaction test [3]. • Prediction speed is unaffected by the value of 'PredictorSelection'.

For details on how fitctree selects split predictors, see “Node Splitting Rules” on page 32-1791 and “Choose Split Predictor Selection Technique” on page 19-14. Example: 'PredictorSelection','curvature' Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and one of the following: • Character vector or string scalar. • 'empirical' determines class probabilities from class frequencies in the response variable in Y or Tbl. If you pass observation weights, fitctree uses the weights to compute the class probabilities. 32-1780

fitctree

• 'uniform' sets all class probabilities to be equal. • Vector (one scalar value for each class). To specify the class order for the corresponding elements of 'Prior', also specify the ClassNames name-value pair argument. • Structure S with two fields. • S.ClassNames contains the class names as a variable of the same type as the response variable in Y or Tbl. • S.ClassProbs contains a vector of corresponding probabilities. If you set values for both 'Weights' and 'Prior', fitctree normalizes the weights in each class to add up to the value of the prior probability of the respective class. Example: 'Prior','uniform' Data Types: char | string | single | double | struct Prune — Flag to estimate optimal sequence of pruned subtrees 'on' (default) | 'off' Flag to estimate the optimal sequence of pruned subtrees, specified as the comma-separated pair consisting of 'Prune' and 'on' or 'off'. If Prune is 'on', then fitctree grows the classification tree without pruning it, but estimates the optimal sequence of pruned subtrees. Otherwise, fitctree grows the classification tree without estimating the optimal sequence of pruned subtrees. To prune a trained ClassificationTree model, pass it to prune. Example: 'Prune','off' PruneCriterion — Pruning criterion 'error' (default) | 'impurity' Pruning criterion, specified as the comma-separated pair consisting of 'PruneCriterion' and 'error' or 'impurity'. If you specify 'impurity', then fitctree uses the impurity measure specified by the 'SplitCriterion' name-value pair argument. For details, see “Impurity and Node Error” on page 32-1789. Example: 'PruneCriterion','impurity' Reproducible — Flag to enforce reproducibility false (logical 0) (default) | true (logical 1) Flag to enforce reproducibility over repeated runs of training a model, specified as the commaseparated pair consisting of 'Reproducible' and either false or true. If 'NumVariablesToSample' is not 'all', then the software selects predictors at random for each split. To reproduce the random selections, you must specify 'Reproducible',true and set the seed of the random number generator by using rng. Note that setting 'Reproducible' to true can slow down training. Example: 'Reproducible',true Data Types: logical 32-1781

32

Functions

ResponseName — Response variable name 'Y' (default) | character vector | string scalar Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector or string scalar representing the name of the response variable. This name-value pair is not valid when using the ResponseVarName or formula input arguments. Example: 'ResponseName','IrisType' Data Types: char | string ScoreTransform — Score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ... Score transformation, specified as the comma-separated pair consisting of 'ScoreTransform' and a character vector, string scalar, or function handle. This table summarizes the available character vectors and string scalars. Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0 0 for x = 0 1 for x > 0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1

'symmetriclogit'

2/(1 + e–x) – 1

For a MATLAB function or a function you define, use its function handle for the score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores). Example: 'ScoreTransform','logit' Data Types: char | string | function_handle SplitCriterion — Split criterion 'gdi' (default) | 'twoing' | 'deviance' Split criterion, specified as the comma-separated pair consisting of 'SplitCriterion' and 'gdi' (Gini's diversity index), 'twoing' for the twoing rule, or 'deviance' for maximum deviance reduction (also known as cross entropy). For details, see “Impurity and Node Error” on page 32-1789. Example: 'SplitCriterion','deviance' 32-1782

fitctree

Surrogate — Surrogate decision splits flag 'off' (default) | 'on' | 'all' | positive integer value Surrogate decision splits on page 32-1791 flag, specified as the comma-separated pair consisting of 'Surrogate' and 'on', 'off', 'all', or a positive integer value. • When set to 'on', fitctree finds at most 10 surrogate splits at each branch node. • When set to 'all', fitctree finds all surrogate splits at each branch node. The 'all' setting can use considerable time and memory. • When set to a positive integer value, fitctree finds at most the specified number of surrogate splits at each branch node. Use surrogate splits to improve the accuracy of predictions for data with missing values. The setting also lets you compute measures of predictive association between predictors. For more details, see “Node Splitting Rules” on page 32-1791. Example: 'Surrogate','on' Data Types: single | double | char | string Weights — Observation weights ones(size(X,1),1) (default) | vector of scalar values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values or the name of a variable in Tbl. The software weights the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows in X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors when training the model. fitctree normalizes the weights in each class to add up to the value of the prior probability of the respective class. Data Types: single | double | char | string Cross-Validation Options

CrossVal — Flag to grow cross-validated decision tree 'off' (default) | 'on' Flag to grow a cross-validated decision tree, specified as the comma-separated pair consisting of 'CrossVal' and 'on' or 'off'. If 'on', fitctree grows a cross-validated decision tree with 10 folds. You can override this crossvalidation setting using one of the 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' namevalue pair arguments. You can only use one of these four arguments at a time when creating a crossvalidated tree. Alternatively, cross-validate tree later using the crossval method. Example: 'CrossVal','on' CVPartition — Partition for cross-validated tree cvpartition object 32-1783

32

Functions

Partition to use in a cross-validated tree, specified as the comma-separated pair consisting of 'CVPartition' and an object created using cvpartition. If you use 'CVPartition', you cannot use any of the 'KFold', 'Holdout', or 'Leaveout' namevalue pair arguments. Holdout — Fraction of data for holdout validation 0 (default) | scalar value in the range [0,1] Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range [0,1]. Holdout validation tests the specified fraction of the data, and uses the rest of the data for training. If you use 'Holdout', you cannot use any of the 'CVPartition', 'KFold', or 'Leaveout' namevalue pair arguments. Example: 'Holdout',0.1 Data Types: single | double KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated classifier, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify, e.g., 'KFold',k, then the software: 1

Randomly partitions the data into k sets

2

For each set, reserves the set as validation data, and trains the model using the other k – 1 sets

3

Stores the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',8 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. Specify 'on' to use leave-one-out cross-validation. If you use 'Leaveout', you cannot use any of the 'CVPartition', 'Holdout', or 'KFold' namevalue pair arguments. Example: 'Leaveout','on' Hyperparameter Optimization Options

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects 32-1784

fitctree

Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: • 'none' — Do not optimize. • 'auto' — Use {'MinLeafSize'} • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names • Vector of optimizableVariable objects, typically the output of hyperparameters The optimization attempts to minimize the cross-validation loss (error) for fitctree by varying the parameters. For information about cross-validation loss (albeit in a different context), see “Classification Loss” on page 32-2756. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note 'OptimizeHyperparameters' values override any values you set using other name-value pair arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes the 'auto' values to apply. The eligible parameters for fitctree are: • MaxNumSplits — fitctree searches among integers, by default log-scaled in the range [1,max(2,NumObservations-1)]. • MinLeafSize — fitctree searches among integers, by default log-scaled in the range [1,max(2,floor(NumObservations/2))]. • SplitCriterion — For two classes, fitctree searches among 'gdi' and 'deviance'. For three or more classes, fitctree also searches among 'twoing'. • NumVariablesToSample — fitctree does not optimize over this hyperparameter. If you pass NumVariablesToSample as a parameter name, fitctree simply uses the full number of predictors. However, fitcensemble does optimize over this hyperparameter. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example, load fisheriris params = hyperparameters('fitctree',meas,species); params(1).Range = [1,30];

Pass params as the value of OptimizeHyperparameters. By default, iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is log(1 + cross-validation loss) for regression and the misclassification rate for classification. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value pair argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value pair argument. For an example, see “Optimize Classification Tree” on page 32-1701. Example: 'auto' HyperparameterOptimizationOptions — Options for optimization structure 32-1785

32

Functions

Options for optimization, specified as the comma-separated pair consisting of 'HyperparameterOptimizationOptions' and a structure. This argument modifies the effect of the OptimizeHyperparameters name-value pair argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults). AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' or 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real. The time limit is in seconds, as measured by tic and toc. Run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables.

32-1786

fitctree

Field Name

Values

Default

ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best objective function value against the iteration number. If there are one or two optimization parameters, and if Optimizer is 'bayesopt', then ShowPlots also plots a model of the objective function against the parameters.

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

false

Verbose

1

Display to the command line. • 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue pair argument.

UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If false, the optimizer uses a single partition for the optimization. true usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three field names. CVPartition

A cvpartition object, as created by cvpartition.

Holdout

A scalar in the range (0,1) representing the holdout fraction.

Kfold

An integer greater than 1.

'Kfold',5 if you do not specify any crossvalidation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

32-1787

32

Functions

Output Arguments tree — Classification tree classification tree object Classification tree, returned as a classification tree object. Using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a tree of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method. Instead, use kfoldPredict to predict responses for observations not used for training. Otherwise, tree is of class ClassificationTree, and you can use the predict method to make predictions.

More About Curvature Test The curvature test is a statistical test assessing the null hypothesis that two variables are unassociated. The curvature test between predictor variable x and y is conducted using this process. 1

If x is continuous, then partition it into its quartiles. Create a nominal variable that bins observations according to which section of the partition they occupy. If there are missing values, then create an extra bin for them.

2

For each level in the partitioned predictor j = 1...J and class in the response k = 1,...,K, compute the weighted proportion of observations in class k π

jk

n



=

i=1

I yi = k wi .

wi is the weight of observation i, ∑ wi = 1, I is the indicator function, and n is the sample size. If n jk , where njk is the number of observations in all observations have the same weight, then π jk = n level j of the predictor that are in class k. 3

Compute the test statistic t=n

K

J

∑ ∑

k=1 j=1

π

j+

=

π

jk − π j + π +k

π

2

j + π +k

∑ π jk, that is, the marginal probability of observing the predictor at level j. π +k = ∑ π jk, k

j

that is the marginal probability of observing class k. If n is large enough, then t is distributed as a χ2 with (K – 1)(J – 1) degrees of freedom. 4

If the p-value for the test is less than 0.05, then reject the null hypothesis that there is no association between x and y.

When determining the best split predictor at each node, the standard CART algorithm prefers to select continuous predictors that have many levels. Sometimes, such a selection can be spurious and can also mask more important predictors that have fewer levels, such as categorical predictors. 32-1788

fitctree

The curvature test can be applied instead of standard CART to determine the best split predictor at each node. In that case, the best split predictor variable is the one that minimizes the significant pvalues (those less than 0.05) of curvature tests between each predictor and the response variable. Such a selection is robust to the number of levels in individual predictors. Note If levels of a predictor are pure for a particular class, then fitctree merges those levels. Therefore, in step 3 of the algorithm, J can be less than the actual number of levels in the predictor. For example, if x has 4 levels, and all observations in bins 1 and 2 belong to class 1, then those levels are pure for class 1. Consequently, fitctree merges the observations in bins 1 and 2, and J reduces to 3. For more details on how the curvature test applies to growing classification trees, see “Node Splitting Rules” on page 32-1791 and [4]. Impurity and Node Error ClassificationTree splits nodes based on either impurity or node error. Impurity means one of several things, depending on your choice of the SplitCriterion name-value pair argument: • Gini's Diversity Index (gdi) — The Gini index of a node is 1 − ∑ p2(i), i

where the sum is over the classes i at the node, and p(i) is the observed fraction of classes with class i that reach the node. A node with just one class (a pure node) has Gini index 0; otherwise the Gini index is positive. So the Gini index is a measure of node impurity. • Deviance ('deviance') — With p(i) defined the same as for the Gini index, the deviance of a node is − ∑ p(i)log2p(i) . i

A pure node has deviance 0; otherwise, the deviance is positive. • Twoing rule ('twoing') — Twoing is not a purity measure of a node, but is a different measure for deciding how to split a node. Let L(i) denote the fraction of members of class i in the left child node after a split, and R(i) denote the fraction of members of class i in the right child node after a split. Choose the split criterion to maximize P(L)P(R)



L(i) − R(i)

2

,

i

where P(L) and P(R) are the fractions of observations that split to the left and right respectively. If the expression is large, the split made each child node purer. Similarly, if the expression is small, the split made each child node similar to each other, and therefore similar to the parent node. The split did not increase node purity. • Node error — The node error is the fraction of misclassified classes at a node. If j is the class with the largest number of training samples at a node, the node error is 1



p(j). 32-1789

32

Functions

Interaction Test The interaction test is a statistical test that assesses the null hypothesis that there is no interaction between a pair of predictor variables and the response variable. The interaction test assessing the association between predictor variables x1 and x2 with respect to y is conducted using this process. 1

If x1 or x2 is continuous, then partition that variable into its quartiles. Create a nominal variable that bins observations according to which section of the partition they occupy. If there are missing values, then create an extra bin for them.

2

Create the nominal variable z with J = J1J2 levels that assigns an index to observation i according to which levels of x1 and x2 it belongs. Remove any levels of z that do not correspond to any observations.

3

Conduct a curvature test on page 32-1788 between z and y.

When growing decision trees, if there are important interactions between pairs of predictors, but there are also many other less important predictors in the data, then standard CART tends to miss the important interactions. However, conducting curvature and interaction tests for predictor selection instead can improve detection of important interactions, which can yield more accurate decision trees. For more details on how the interaction test applies to growing decision trees, see “Curvature Test” on page 32-1788, “Node Splitting Rules” on page 32-1791 and [3]. Predictive Measure of Association The predictive measure of association is a value that indicates the similarity between decision rules that split observations. Among all possible decision splits that are compared to the optimal split (found by growing the tree), the best surrogate decision split on page 32-1791 yields the maximum predictive measure of association. The second-best surrogate split has the second-largest predictive measure of association. Suppose xj and xk are predictor variables j and k, respectively, and j ≠ k. At node t, the predictive measure of association between the optimal split xj < u and a surrogate split xk < v is λ jk =

min PL, PR − 1 − PL jLk − PR jRk min PL, PR

.

• PL is the proportion of observations in node t, such that xj < u. The subscript L stands for the left child of node t. • PR is the proportion of observations in node t, such that xj ≥ u. The subscript R stands for the right child of node t. • PL jLk is the proportion of observations at node t, such that xj < u and xk < v. • PR jRk is the proportion of observations at node t, such that xj ≥ u and xk ≥ v. • Observations with missing values for xj or xk do not contribute to the proportion calculations. λjk is a value in (–∞,1]. If λjk > 0, then xk < v is a worthwhile surrogate split for xj < u.

32-1790

fitctree

Surrogate Decision Splits A surrogate decision split is an alternative to the optimal decision split at a given node in a decision tree. The optimal split is found by growing the tree; the surrogate split uses a similar or correlated predictor variable and split criterion. When the value of the optimal split predictor for an observation is missing, the observation is sent to the left or right child node using the best surrogate predictor. When the value of the best surrogate split predictor for the observation is also missing, the observation is sent to the left or right child node using the second-best surrogate predictor, and so on. Candidate splits are sorted in descending order by their predictive measure of association on page 32-2179.

Tip • By default, Prune is 'on'. However, this specification does not prune the classification tree. To prune a trained classification tree, pass the classification tree to prune. • After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 31-2.

Algorithms Node Splitting Rules fitctree uses these processes to determine how to split node t. • For standard CART (that is, if PredictorSelection is 'allpairs') and for all predictors xi, i = 1,...,p: 1

fitctree computes the weighted impurity of node t, it. For supported impurity measures, see SplitCriterion.

2

fitctree estimates the probability that an observation is in node t using PT =



j∈T

wj .

wj is the weight of observation j, and T is the set of all observation indices in node t. If you do not specify Prior or Weights, then wj = 1/n, where n is the sample size. 3

fitctree sorts xi in ascending order. Each element of the sorted predictor is a splitting candidate or cut point. fitctree stores any indices corresponding to missing values in the set TU, which is the unsplit set.

4

fitctree determines the best way to split node t using xi by maximizing the impurity gain (ΔI) over all splitting candidates. That is, for all splitting candidates in xi: a

fitctree splits the observations in node t into left and right child nodes (tL and tR, respectively).

b

fitctree computes ΔI. Suppose that for a particular splitting candidate, tL and tR contain observation indices in the sets TL and TR, respectively. • If xi does not contain any missing values, then the impurity gain for the current splitting candidate is 32-1791

32

Functions

ΔI = P T it − P TL itL − P TR itR . • If xi contains missing values then, assuming that the observations are missing at random, the impurity gain is ΔIU = P T − TU it − P TL itL − P TR itR . T – TU is the set of all observation indices in node t that are not missing. • If you use surrogate decision splits on page 32-1791, then:

c

i

fitctree computes the predictive measures of association on page 32-1790 between the decision split xj < u and all possible decision splits xk < v, j ≠ k.

ii

fitctree sorts the possible alternative decision splits in descending order by their predictive measure of association with the optimal split. The surrogate split is the decision split yielding the largest measure.

iii

fitctree decides the child node assignments for observations with a missing value for xi using the surrogate split. If the surrogate predictor also contains a missing value, then fitctree uses the decision split with the second largest measure, and so on, until there are no other surrogates. It is possible for fitctree to split two different observations at node t using two different surrogate splits. For example, suppose the predictors x1 and x2 are the best and second best surrogates, respectively, for the predictor xi, i ∉ {1,2}, at node t. If observation m of predictor xi is missing (i.e., xmi is missing), but xm1 is not missing, then x1 is the surrogate predictor for observation xmi. If observations x(m + 1),i and x(m + 1),1 are missing, but x(m + 1),2 is not missing, then x2 is the surrogate predictor for observation m + 1.

iv

fitctree uses the appropriate impurity gain formula. That is, if fitctree fails to assign all missing observations in node t to children nodes using surrogate splits, then the impurity gain is ΔIU. Otherwise, fitctree uses ΔI for the impurity gain.

fitctree chooses the candidate that yields the largest impurity gain.

fitctree splits the predictor variable at the cut point that maximizes the impurity gain. • For the curvature test (that is, if PredictorSelection is 'curvature'): 1

fitctree conducts curvature tests on page 32-1788 between each predictor and the response for observations in node t. • If all p-values are at least 0.05, then fitctree does not split node t. • If there is a minimal p-value, then fitctree chooses the corresponding predictor to split node t. • If more than one p-value is zero due to underflow, then fitctree applies standard CART to the corresponding predictors to choose the split predictor.

2

If fitctree chooses a split predictor, then it uses standard CART to choose the cut point (see step 4 in the standard CART process).

• For the interaction test (that is, if PredictorSelection is 'interaction-curvature' ): 1

32-1792

For observations in node t, fitctree conducts curvature tests on page 32-1788 between each predictor and the response and interaction tests on page 32-1790 between each pair of predictors and the response.

fitctree

• If all p-values are at least 0.05, then fitctree does not split node t. • If there is a minimal p-value and it is the result of a curvature test, then fitctree chooses the corresponding predictor to split node t. • If there is a minimal p-value and it is the result of an interaction test, then fitctree chooses the split predictor using standard CART on the corresponding pair of predictors. • If more than one p-value is zero due to underflow, then fitctree applies standard CART to the corresponding predictors to choose the split predictor. 2

If fitctree chooses a split predictor, then it uses standard CART to choose the cut point (see step 4 in the standard CART process).

Tree Depth Control • If MergeLeaves is 'on' and PruneCriterion is 'error' (which are the default values for these name-value pair arguments), then the software applies pruning only to the leaves and by using classification error. This specification amounts to merging leaves that share the most popular class per leaf. • To accommodate MaxNumSplits, fitctree splits all nodes in the current layer, and then counts the number of branch nodes. A layer is the set of nodes that are equidistant from the root node. If the number of branch nodes exceeds MaxNumSplits, fitctree follows this procedure: 1

Determine how many branch nodes in the current layer must be unsplit so that there are at most MaxNumSplits branch nodes.

2

Sort the branch nodes by their impurity gains.

3

Unsplit the number of least successful branches.

4

Return the decision tree grown so far.

This procedure produces maximally balanced trees. • The software splits branch nodes layer by layer until at least one of these events occurs: • There are MaxNumSplits branch nodes. • A proposed split causes the number of observations in at least one branch node to be fewer than MinParentSize. • A proposed split causes the number of observations in at least one leaf node to be fewer than MinLeafSize. • The algorithm cannot find a good split within a layer (i.e., the pruning criterion (see PruneCriterion), does not improve for all proposed splits in a layer). A special case is when all nodes are pure (i.e., all observations in the node have the same class). • For values 'curvature' or 'interaction-curvature' of PredictorSelection, all tests yield p-values greater than 0.05. MaxNumSplits and MinLeafSize do not affect splitting at their default values. Therefore, if you set 'MaxNumSplits', splitting might stop due to the value of MinParentSize, before MaxNumSplits splits occur. Parallelization For dual-core systems and above, fitctree parallelizes training decision trees using Intel Threading Building Blocks (TBB). For details on Intel TBB, see https://software.intel.com/en-us/intel-tbb. 32-1793

32

Functions

References [1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984. [2] Coppersmith, D., S. J. Hong, and J. R. M. Hosking. “Partitioning Nominal Attributes in Decision Trees.” Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217. [3] Loh, W.Y. “Regression Trees with Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, Vol. 12, 2002, pp. 361–386. [4] Loh, W.Y. and Y.S. Shih. “Split Selection Methods for Classification Trees.” Statistica Sinica, Vol. 7, 1997, pp. 815–840.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. Usage notes and limitations: • Supported syntaxes are: • tree = fitctree(Tbl,Y) • tree = fitctree(X,Y) • tree = fitctree(___,Name,Value) • [tree,FitInfo,HyperparameterOptimizationResults] = fitctree(___,Name,Value) — fitctree returns the additional output arguments FitInfo and HyperparameterOptimizationResults when you specify the 'OptimizeHyperparameters' name-value pair argument. • The FitInfo output argument is an empty structure array currently reserved for possible future use. • The HyperparameterOptimizationResults output argument is a BayesianOptimization object or a table of hyperparameters with associated values that describe the cross-validation optimization of hyperparameters. 'HyperparameterOptimizationResults' is nonempty when the 'OptimizeHyperparameters' name-value pair argument is nonempty at the time you create the model. The values in 'HyperparameterOptimizationResults' depend on the value you specify for the 'HyperparameterOptimizationOptions' name-value pair argument when you create the model. • If you specify 'bayesopt' (default), then HyperparameterOptimizationResults is an object of class BayesianOptimization. • If you specify 'gridsearch' or 'randomsearch', then HyperparameterOptimizationResults is a table of the hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst). • Supported name-value pair arguments, and any differences, are: • 'AlgorithmForCategorical' 32-1794

fitctree

• 'CategoricalPredictors' • 'ClassNames' • 'Cost' • 'HyperparameterOptimizationOptions' — For cross-validation, tall optimization supports only 'Holdout' validation. For example, you can specify fitctree(X,Y,'OptimizeHyperparameters','auto','HyperparameterOptimizatio nOptions',struct('Holdout',0.2)). • 'MaxNumCategories' • 'MaxNumSplits'— for tall optimization, fitctree searches among integers, by default logscaled in the range [1,max(2,min(10000,NumObservations-1))]. • 'MergeLeaves' • 'MinLeafSize' • 'MinParentSize' • 'NumVariablesToSample' • 'OptimizeHyperparameters' • 'PredictorNames' • 'Prior' • 'ResponseName' • 'ScoreTransform' • 'SplitCriterion' • 'Weights' • This additional name-value pair argument is specific to tall arrays: • 'MaxDepth' — A positive integer specifying the maximum depth of the output tree. Specify a value for this argument to return a tree that has fewer levels and requires fewer passes through the tall array to compute. Generally, the algorithm of fitctree takes one pass through the data and an additional pass for each tree level. The function does not set a maximum tree depth, by default. For more information, see “Tall Arrays” (MATLAB). Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. To perform parallel hyperparameter optimization, use the 'HyperparameterOptions', struct('UseParallel',true) name-value pair argument in the call to this function. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationPartitionedModel | ClassificationTree | kfoldPredict | predict | prune 32-1795

32

Functions

Topics “Splitting Categorical Predictors in Classification Trees” on page 19-25 Introduced in R2014a

32-1796

fitglm

fitglm Create generalized linear regression model

Syntax mdl mdl mdl mdl

= = = =

fitglm(tbl) fitglm(X,y) fitglm( ___ ,modelspec) fitglm( ___ ,Name,Value)

Description mdl = fitglm(tbl) returns a generalized linear model fit to variables in the table or dataset array tbl. By default, fitglm takes the last variable as the response variable. mdl = fitglm(X,y) returns a generalized linear model of the responses y, fit to the data matrix X. mdl = fitglm( ___ ,modelspec) returns a generalized linear model of the type you specify in modelspec. mdl = fitglm( ___ ,Name,Value) returns a generalized linear model with additional options specified by one or more Name,Value pair arguments. For example, you can specify which variables are categorical, the distribution of the response variable, and the link function to use.

Examples Fit a Logistic Regression Model Make a logistic binomial model of the probability of smoking as a function of age, weight, and sex, using a two-way interactions model. Load the hospital dataset array. load hospital dsa = hospital;

Specify the model using a formula that allows up to two-way interactions between the variables age, weight, and sex. Smoker is the response variable. modelspec = 'Smoker ~ Age*Weight*Sex - Age:Weight:Sex';

Fit a logistic binomial model. mdl = fitglm(dsa,modelspec,'Distribution','binomial') mdl = Generalized linear regression model: logit(Smoker) ~ 1 + Sex*Age + Sex*Weight + Age*Weight

32-1797

32

Functions

Distribution = Binomial Estimated Coefficients:

(Intercept) Sex_Male Age Weight Sex_Male:Age Sex_Male:Weight Age:Weight

Estimate ___________

SE _________

tStat ________

pValue _______

-6.0492 -2.2859 0.11691 0.031109 0.020734 0.01216 -0.00071959

19.749 12.424 0.50977 0.15208 0.20681 0.053168 0.0038964

-0.3063 -0.18399 0.22934 0.20455 0.10025 0.22871 -0.18468

0.75938 0.85402 0.81861 0.83792 0.92014 0.8191 0.85348

100 observations, 93 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 5.07, p-value = 0.535

All of the p-values (under pValue) are large. This means none of the coefficients are significant. The large p-value for the test of the model, 0.535, indicates that this model might not differ statistically from a constant model.

GLM for Poisson Response Create sample data with 20 predictors, and Poisson response using just three of the predictors, plus a constant. rng('default') % for reproducibility X = randn(100,7); mu = exp(X(:,[1 3 6])*[.4;.2;.3] + 1); y = poissrnd(mu);

Fit a generalized linear model using the Poisson distribution. mdl =

fitglm(X,y,'linear','Distribution','poisson')

mdl = Generalized linear regression model: log(y) ~ 1 + x1 + x2 + x3 + x4 + x5 + x6 + x7 Distribution = Poisson Estimated Coefficients: Estimate _________ (Intercept) x1 x2 x3 x4 x5 x6 x7

32-1798

0.88723 0.44413 0.0083388 0.21518 -0.058386 -0.060824 0.34267 0.04316

SE ________

tStat ________

pValue __________

0.070969 0.052337 0.056527 0.063416 0.065503 0.073441 0.056778 0.06146

12.502 8.4858 0.14752 3.3932 -0.89135 -0.8282 6.0352 0.70225

7.3149e-36 2.1416e-17 0.88272 0.00069087 0.37274 0.40756 1.5878e-09 0.48252

fitglm

100 observations, 92 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 119, p-value = 1.55e-22

The p-values of 2.14e-17, 0.00069, and 1.58e-09 indicate that the coefficients of the variables x1, x3, and |x6|are statistically significant.

Input Arguments tbl — Input data table | dataset array Input data including predictor and response variables, specified as a table or dataset array. The predictor variables and response variable can be numeric, logical, categorical, character, or string. The response variable can have a data type other than numeric only if 'Distribution' is 'binomial'. • By default, fitglm takes the last variable as the response variable and the others as the predictor variables. • To set a different column as the response variable, use the ResponseVar name-value pair argument. • To use a subset of the columns as predictors, use the PredictorVars name-value pair argument. • To define a model specification, set the modelspec argument using a formula or terms matrix. The formula or terms matrix specifies which columns to use as the predictor or response variables. The variable names in a table do not have to be valid MATLAB identifiers. However, if the names are not valid, you cannot use a formula when you fit or adjust a model; for example: • You cannot specify modelspec using a formula. • You cannot use a formula to specify the terms to add or remove when you use the addTerms function or the removeTerms function, respectively. • You cannot use a formula to specify the lower and upper bounds of the model when you use the step or stepwiseglm function with the name-value pair arguments 'Lower' and 'Upper', respectively. You can verify the variable names in tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,tbl.Properties.VariableNames)

If the variable names in tbl are not valid, then convert them by using the matlab.lang.makeValidName function. tbl.Properties.VariableNames = matlab.lang.makeValidName(tbl.Properties.VariableNames);

X — Predictor variables matrix Predictor variables, specified as an n-by-p matrix, where n is the number of observations and p is the number of predictor variables. Each column of X represents one variable, and each row represents one observation. 32-1799

32

Functions

By default, there is a constant term in the model, unless you explicitly remove it, so do not include a column of 1s in X. Data Types: single | double y — Response variable vector | matrix Response variable, specified as a vector or matrix. • If 'Distribution' is not 'binomial', then y must be an n-by-1 vector, where n is the number of observations. Each entry in y is the response for the corresponding row of X. The data type must be single or double. • If 'Distribution' is 'binomial', then y can be an n-by-1 vector or n-by-2 matrix with counts in column 1 and BinomialSize in column 2. Data Types: single | double | logical | categorical modelspec — Model specification 'linear' (default) | character vector or string scalar naming the model | t-by-(p + 1) terms matrix | character vector or string scalar formula in the form 'Y ~ terms' Model specification, specified as one of these values. • A character vector or string scalar naming the model. Value

Model Type

'constant'

Model contains only a constant (intercept) term.

'linear'

Model contains an intercept and linear term for each predictor.

'interactions'

Model contains an intercept, linear term for each predictor, and all products of pairs of distinct predictors (no squared terms).

'purequadratic'

Model contains an intercept term and linear and squared terms for each predictor.

'quadratic'

Model contains an intercept term, linear and squared terms for each predictor, and all products of pairs of distinct predictors.

'polyijk'

Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor, and so on. Specify the maximum degree for each predictor by using numerals 0 though 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example, 'poly13' has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second predictors, respectively.

• A t-by-(p + 1) matrix, or a “Terms Matrix” on page 32-1805, specifying terms in the model, where t is the number of terms and p is the number of predictor variables, and +1 accounts for the response variable. A terms matrix is convenient when the number of predictors is large and you want to generate the terms programmatically. 32-1800

fitglm

• A character vector or string scalar representing a “Formula” on page 32-1806 in the form 'Y ~ terms', where the terms are in “Wilkinson Notation” on page 32-1806. The variable names in the formula must be valid MATLAB identifiers. The software determines the order of terms in a fitted model by using the order of terms in tbl or X. Therefore, the order of terms in the model can be different from the order of terms in the specified formula. Example: 'quadratic' Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Distribution','normal','link','probit','Exclude',[23,59] specifies that the distribution of the response is normal, and instructs fitglm to use the probit link function and exclude the 23rd and 59th observations from the fit. BinomialSize — Number of trials for binomial distribution 1 (default) | numeric scalar | numeric vector | character vector | string scalar Number of trials for binomial distribution, that is the sample size, specified as the comma-separated pair consisting of 'BinomialSize' and the variable name in tbl, a numeric scalar, or a numeric vector of the same length as the response. This is the parameter n for the fitted binomial distribution. BinomialSize applies only when the Distribution parameter is 'binomial'. If BinomialSize is a scalar value, that means all observations have the same number of trials. As an alternative to BinomialSize, you can specify the response as a two-column matrix with counts in column 1 and BinomialSize in column 2. Data Types: single | double | char | string CategoricalVars — Categorical variable list string array | cell array of character vectors | logical or numeric index vector Categorical variable list, specified as the comma-separated pair consisting of 'CategoricalVars' and either a string array or cell array of character vectors containing categorical variable names in the table or dataset array tbl, or a logical or numeric index vector indicating which columns are categorical. • If data is in a table or dataset array tbl, then, by default, fitglm treats all categorical values, logical values, character arrays, string arrays, and cell arrays of character vectors as categorical variables. • If data is in matrix X, then the default value of 'CategoricalVars' is an empty matrix []. That is, no variable is categorical unless you specify it as categorical. For example, you can specify the observations 2 and 3 out of 6 as categorical using either of the following examples. Example: 'CategoricalVars',[2,3] Example: 'CategoricalVars',logical([0 1 1 0 0 0]) 32-1801

32

Functions

Data Types: single | double | logical | string | cell DispersionFlag — Indicator to compute dispersion parameter false for 'binomial' and 'poisson' distributions (default) | true Indicator to compute dispersion parameter for 'binomial' and 'poisson' distributions, specified as the comma-separated pair consisting of 'DispersionFlag' and one of the following. true

Estimate a dispersion parameter when computing standard errors. The estimated dispersion parameter value is the sum of squared Pearson residuals divided by the degrees of freedom for error (DFE).

false

Default. Use the theoretical value when computing standard errors.

The fitting function always estimates the dispersion for other distributions. Example: 'DispersionFlag',true Distribution — Distribution of the response variable 'normal' (default) | 'binomial' | 'poisson' | 'gamma' | 'inverse gaussian' Distribution of the response variable, specified as the comma-separated pair consisting of 'Distribution' and one of the following. 'normal'

Normal distribution

'binomial'

Binomial distribution

'poisson'

Poisson distribution

'gamma'

Gamma distribution

'inverse gaussian'

Inverse Gaussian distribution

Example: 'Distribution','gamma' Exclude — Observations to exclude logical or numeric index vector Observations to exclude from the fit, specified as the comma-separated pair consisting of 'Exclude' and a logical or numeric index vector indicating which observations to exclude from the fit. For example, you can exclude observations 2 and 3 out of 6 using either of the following examples. Example: 'Exclude',[2,3] Example: 'Exclude',logical([0 1 1 0 0 0]) Data Types: single | double | logical Intercept — Indicator for constant term true (default) | false Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model. Use 'Intercept' only when specifying the model using a character vector or string scalar, not a formula or matrix. Example: 'Intercept',false 32-1802

fitglm

Link — Link function canonical link function (default) | scalar value | structure Link function to use in place of the canonical link function, specified as the comma-separated pair consisting of 'Link' and one of the following. Link Function Name

Link Function

Mean (Inverse) Function

'identity'

f(μ) = μ

μ = Xb

'log'

f(μ) = log(μ)

μ = exp(Xb)

'logit'

f(μ) = log(μ/(1–μ))

μ = exp(Xb) / (1 + exp(Xb))

'probit'

f(μ) = Φ–1(μ)

μ = Φ(Xb)

'comploglog'

f(μ) = log(–log(1 – μ))

μ = 1 – exp(–exp(Xb))

'reciprocal'

f(μ) = 1/μ

μ = 1/(Xb)

p

p (a number)

f(μ) = μ

μ = Xb1/p

S (a structure) with three fields. Each field holds a function handle that accepts a vector of inputs and returns a vector of the same size:

f(μ) = S.Link(μ)

μ = S.Inverse(Xb)

• S.Link — The link function • S.Inverse — The inverse link function • S.Derivative — The derivative of the link function The link function defines the relationship f(μ) = X*b between the mean response μ and the linear combination of predictors X*b. For more information on the canonical link functions, see “Canonical Link Function” on page 321807. Example: 'Link','probit' Data Types: char | string | single | double | struct Offset — Offset variable [ ] (default) | numeric vector | character vector | string scalar Offset variable in the fit, specified as the comma-separated pair consisting of 'Offset' and the variable name in tbl or a numeric vector with the same length as the response. fitglm uses Offset as an additional predictor with a coefficient value fixed at 1. In other words, the formula for fitting is f(μ) = Offset + X*b, where f is the link function, μ is the mean response, and X*b is the linear combination of predictors X. The Offset predictor has coefficient 1.

32-1803

32

Functions

For example, consider a Poisson regression model. Suppose the number of counts is known for theoretical reasons to be proportional to a predictor A. By using the log link function and by specifying log(A) as an offset, you can force the model to satisfy this theoretical constraint. Data Types: single | double | char | string PredictorVars — Predictor variables string array | cell array of character vectors | logical or numeric index vector Predictor variables to use in the fit, specified as the comma-separated pair consisting of 'PredictorVars' and either a string array or cell array of character vectors of the variable names in the table or dataset array tbl, or a logical or numeric index vector indicating which columns are predictor variables. The string values or character vectors should be among the names in tbl, or the names you specify using the 'VarNames' name-value pair argument. The default is all variables in X, or all variables in tbl except for ResponseVar. For example, you can specify the second and third variables as the predictor variables using either of the following examples. Example: 'PredictorVars',[2,3] Example: 'PredictorVars',logical([0 1 1 0 0 0]) Data Types: single | double | logical | string | cell ResponseVar — Response variable last column in tbl (default) | character vector or string scalar containing variable name | logical or numeric index vector Response variable to use in the fit, specified as the comma-separated pair consisting of 'ResponseVar' and either a character vector or string scalar containing the variable name in the table or dataset array tbl, or a logical or numeric index vector indicating which column is the response variable. You typically need to use 'ResponseVar' when fitting a table or dataset array tbl. For example, you can specify the fourth variable, say yield, as the response out of six variables, in one of the following ways. Example: 'ResponseVar','yield' Example: 'ResponseVar',[4] Example: 'ResponseVar',logical([0 0 0 1 0 0]) Data Types: single | double | logical | char | string VarNames — Names of variables {'x1','x2',...,'xn','y'} (default) | string array | cell array of character vectors Names of variables, specified as the comma-separated pair consisting of 'VarNames' and a string array or cell array of character vectors including the names for the columns of X first, and the name for the response variable y last. 'VarNames' is not applicable to variables in a table or dataset array, because those variables already have names. 32-1804

fitglm

The variable names do not have to be valid MATLAB identifiers. However, if the names are not valid, you cannot use a formula when you fit or adjust a model; for example: • You cannot use a formula to specify the terms to add or remove when you use the addTerms function or the removeTerms function, respectively. • You cannot use a formula to specify the lower and upper bounds of the model when you use the step or stepwiseglm function with the name-value pair arguments 'Lower' and 'Upper', respectively. Before specifying 'VarNames',varNames, you can verify the variable names in varNames by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,varNames)

If the variable names in varNames are not valid, then convert them by using the matlab.lang.makeValidName function. varNames = matlab.lang.makeValidName(varNames);

Example: 'VarNames',{'Horsepower','Acceleration','Model_Year','MPG'} Data Types: string | cell Weights — Observation weights ones(n,1) (default) | n-by-1 vector of nonnegative scalar values Observation weights, specified as the comma-separated pair consisting of 'Weights' and an n-by-1 vector of nonnegative scalar values, where n is the number of observations. Data Types: single | double

Output Arguments mdl — Generalized linear regression model GeneralizedLinearModel object Generalized linear regression model, specified as a GeneralizedLinearModel object created using fitglm or stepwiseglm.

More About Terms Matrix A terms matrix T is a t-by-(p + 1) matrix specifying terms in a model, where t is the number of terms, p is the number of predictor variables, and +1 accounts for the response variable. The value of T(i,j) is the exponent of variable j in term i. For example, suppose that an input includes three predictor variables A, B, and C and the response variable Y in the order A, B, C, and Y. Each row of T represents one term: • [0 0 0 0] — Constant term or intercept • [0 1 0 0] — B; equivalently, A^0 * B^1 * C^0 • [1 0 1 0] — A*C 32-1805

32

Functions

• [2 0 0 0] — A^2 • [0 1 2 0] — B*(C^2) The 0 at the end of each term represents the response variable. In general, a column vector of zeros in a terms matrix represents the position of the response variable. If you have the predictor and response variables in a matrix and column vector, then you must include 0 for the response variable in the last column of each row. Formula A formula for model specification is a character vector or string scalar of the form 'Y ~ terms'. • Y is the response name. • terms represents the predictor terms in a model using Wilkinson notation. For example: • 'Y ~ A + B + C' specifies a three-variable linear model with intercept. • 'Y ~ A + B + C – 1' specifies a three-variable linear model without intercept. Note that formulas include a constant (intercept) term by default. To exclude a constant term from the model, you must include –1 in the formula. Wilkinson Notation Wilkinson notation describes the terms present in a model. The notation relates to the terms present in a model, not to the multipliers (coefficients) of those terms. Wilkinson notation uses these symbols: • + means include the next variable. • – means do not include the next variable. • : defines an interaction, which is a product of terms. • * defines an interaction and all lower-order terms. • ^ raises the predictor to a power, exactly as in * repeated, so ^ includes lower-order terms as well. • () groups terms. This table shows typical examples of Wilkinson notation.

32-1806

Wilkinson Notation

Term in Standard Notation

1

Constant (intercept) term

A^k, where k is a positive integer

A, A2, ..., Ak

A + B

A, B

A*B

A, B, A*B

A:B

A*B only

–B

Do not include B

A*B + C

A, B, C, A*B

A + B + C + A:B

A, B, C, A*B

A*B*C – A:B:C

A, B, C, A*B, A*C, B*C

fitglm

Wilkinson Notation

Term in Standard Notation

A*(B + C)

A, B, C, A*B, A*C

Statistics and Machine Learning Toolbox notation always includes a constant term unless you explicitly remove the term using –1. For more details, see “Wilkinson Notation” on page 11-90. Canonical Link Function The default link function for a generalized linear model is the canonical link function. Distribution

Canonical Link Function Name

Link Function

Mean (Inverse) Function

'normal'

'identity'

f(μ) = μ

μ = Xb

'binomial'

'logit'

f(μ) = log(μ/(1 – μ))

μ = exp(Xb) / (1 + exp(Xb))

'poisson'

'log'

f(μ) = log(μ)

μ = exp(Xb)

'gamma'

-1

f(μ) = 1/μ

μ = 1/(Xb)

'inverse gaussian' -2

f(μ) = 1/μ

2

μ = (Xb)–1/2

Tips • The generalized linear model mdl is a standard linear model unless you specify otherwise with the Distribution name-value pair. • For methods such as plotResiduals or devianceTest, or properties of the GeneralizedLinearModel object, see GeneralizedLinearModel. • After training a model, you can generate C/C++ code that predicts responses for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 31-2.

Algorithms • fitglm treats a categorical predictor as follows: • A model with a categorical predictor that has L levels (categories) includes L – 1 indicator variables. The model uses the first category as a reference level, so it does not include the indicator variable for the reference level. If the data type of the categorical predictor is categorical, then you can check the order of categories by using categories and reorder the categories by using reordercats to customize the reference level. • fitglm treats the group of L – 1 indicator variables as a single variable. If you want to treat the indicator variables as distinct predictor variables, create indicator variables manually by using dummyvar. Then use the indicator variables, except the one corresponding to the reference level of the categorical variable, when you fit a model. For the categorical predictor X, if you specify all columns of dummyvar(X) and an intercept term as predictors, then the design matrix becomes rank deficient. • Interaction terms between a continuous predictor and a categorical predictor with L levels consist of the element-wise product of the L – 1 indicator variables with the continuous predictor. 32-1807

32

Functions

• Interaction terms between two categorical predictors with L and M levels consist of the (L – 1)*(M – 1) indicator variables to include all possible combinations of the two categorical predictor levels. • You cannot specify higher-order terms for a categorical predictor because the square of an indicator is equal to itself. • fitglm considers NaN, '' (empty character vector), "" (empty string), , and values in tbl, X, and Y to be missing values. fitglm does not use observations with missing values in the fit. The ObservationInfo property of a fitted model indicates whether or not fitglm uses each observation in the fit.

Alternative Functionality • Use stepwiseglm to select a model specification automatically. Use step, addTerms, or removeTerms to adjust a fitted model.

References [1] Collett, D. Modeling Binary Data. New York: Chapman & Hall, 2002. [2] Dobson, A. J. An Introduction to Generalized Linear Models. New York: Chapman & Hall, 1990. [3] McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function supports tall arrays for out-of-memory data with some limitations. • If any input argument to fitglm is a tall array, then all of the other inputs must be tall arrays as well. This includes nonempty variables supplied with the 'Weights', 'Exclude', 'Offset', and 'BinomialSize' name-value pairs. • The default number of iterations is 5. You can change the number of iterations using the 'Options' name-value pair to pass in an options structure. Create an options structure using statset to specify a different value for MaxIter. • For tall data, fitglm returns a CompactGeneralizedLinearModel object that contains most of the same properties as a GeneralizedLinearModel object. The main difference is that the compact object is sensitive to memory requirements. The compact object does not include properties that include the data, or that include an array of the same size as the data. The compact object does not contain these GeneralizedLinearModel properties: • Diagnostics • Fitted • Offset • ObservationInfo • ObservationNames • Residuals 32-1808

fitglm

• Steps • Variables You can compute the residuals directly from the compact object returned by GLM = fitglm(X,Y) using RES = Y - predict(GLM,X); S = sqrt(GLM.SSE/GLM.DFE); histogram(RES,linspace(-3*S,3*S,51))

For more information, see “Tall Arrays for Out-of-Memory Data” (MATLAB).

See Also GeneralizedLinearModel | predict | stepwiseglm Topics “Generalized Linear Model Workflow” on page 12-28 “Generalized Linear Models” on page 12-9 Introduced in R2013b

32-1809

32

Functions

fitglme Fit generalized linear mixed-effects model

Syntax glme = fitglme(tbl,formula) glme = fitglme(tbl,formula,Name,Value)

Description glme = fitglme(tbl,formula) returns a generalized linear mixed-effects model, glme. The model is specified by formula and fitted to the predictor variables in the table or dataset array, tbl. glme = fitglme(tbl,formula,Name,Value) returns a generalized linear mixed-effects model using additional options specified by one or more Name,Value pair arguments. For example, you can specify the distribution of the response, the link function, or the covariance pattern of the randomeffects terms.

Examples Fit a Generalized Linear Mixed-Effects Model Load the sample data. load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data: • Flag to indicate whether the batch used the new process (newprocess) • Processing time for each batch, in hours (time) • Temperature of the batch, in degrees Celsius (temp) • Categorical variable indicating the supplier of the chemical used in the batch (supplier) • Number of defects in the batch (defects) The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius. Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this 32-1810

fitglme

model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0. The number of defects can be modeled using a Poisson distribution defectsi j ∼ Poisson μi j . This corresponds to the generalized linear mixed-effects model log μi j = β0 + β1newprocessi j + β2time_devi j + β3temp_devi j + β4supplier_Ci j + β5supplier_Bi j + bi, where • defectsi j is the number of defects observed in the batch produced by factory i during batch j. • μi j is the mean number of defects corresponding to factory i (where i = 1, 2, . . . , 20) during batch j (where j = 1, 2, . . . , 5). • newprocessi j, time_devi j, and temp_devi j are the measurements for each variable that correspond to factory i during batch j. For example, newprocessi j indicates whether the batch produced by factory i during batch j used the new process. • supplier_Ci j and supplier_Bi j are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j. • b ∼ N(0, σ2) is a random-effects intercept for each factory i that accounts for factory-specific i b variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ... 'Distribution','Poisson','Link','log','FitMethod','Laplace', ... 'DummyVarCoding','effects');

Display the model. disp(glme) Generalized linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters Distribution Link FitMethod

100 6 20 1 Poisson Log Laplace

Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory) Model fit statistics: AIC BIC 416.35 434.58

LogLikelihood -201.17

Deviance 402.35

Fixed effects coefficients (95% CIs):

32-1811

32

Functions

Name {'(Intercept)'} {'newprocess' } {'time_dev' } {'temp_dev' } {'supplier_C' } {'supplier_B' } Lower 1.1515 -0.72019 -1.7395 -2.1926 -0.22679 -0.082588

Estimate 1.4689 -0.36766 -0.094521 -0.28317 -0.071868 0.071072

SE 0.15988 0.17755 0.82849 0.9617 0.078024 0.07739

tStat 9.1875 -2.0708 -0.11409 -0.29444 -0.9211 0.91836

DF 94 94 94 94 94 94

pValue 9.8194e-15 0.041122 0.90941 0.76907 0.35936 0.36078

Upper 1.7864 -0.015134 1.5505 1.6263 0.083051 0.22473

Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} Group: Error Name {'sqrt(Dispersion)'}

Type {'std'}

Estimate 0.31381

Estimate 1

The Model information table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a Poisson distribution, the link function is Log, and the fit method is Laplace. Formula indicates the model specification using Wilkinson’s notation. The Model fit statistics table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (AIC), Bayesian information criterion (BIC) values, log likelihood (LogLikelihood), and deviance (Deviance) values. The Fixed effects coefficients table indicates that fitglme returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (Name) contains the name of each fixed-effects coefficient, column 2 (Estimate) contains its estimated value, and column 3 (SE) contains the standard error of the coefficient. Column 4 (tStat) contains the t-statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (DF) and column 6 (pValue) contain the degrees of freedom and p-value that correspond to the t-statistic, respectively. The last two columns (Lower and Upper) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient. Random effects covariance parameters displays a table for each grouping variable (here, only factory), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, std indicates that fitglme returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1. The standard display generated by fitglme does not provide confidence intervals for the randomeffects parameters. To compute and display these values, use covarianceParameters.

32-1812

fitglme

Input Arguments tbl — Input data table | dataset array Input data, which includes the response variable, predictor variables, and grouping variables, specified as a table or dataset array. The predictor variables can be continuous or grouping variables (see “Grouping Variables” on page 2-45). You must specify the model for the variables using formula. formula — Formula for model specification character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)' Formula for model specification, specified as a character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'. The formula is case sensitive. For a full description, see “Formula” on page 32-1822. Example: 'y ~ treatment + (1|block)' Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding', 'effects' specifies the response variable distribution as Poisson, the link function as log, the fit method as Laplace, and dummy variable coding where the coefficients sum to 0. BinomialSize — Number of trials for binomial distribution 1 (default) | scalar value | vector | variable name Number of trials for binomial distribution, that is the sample size, specified as the comma-separated pair consisting of a scalar value, a vector of the same length as the response, or the name of a variable in the input table. If you specify the name of a variable, then the variable must be of the same length as the response. BinomialSize applies only when the Distribution parameter is 'binomial'. If BinomialSize is a scalar value, that means all observations have the same number of trials. Data Types: single | double CheckHessian — Indicator to check positive definiteness of Hessian false (default) | true Indicator to check the positive definiteness of the Hessian of the objective function with respect to unconstrained parameters at convergence, specified as the comma-separated pair consisting of 'CheckHessian' and either false or true. Default is false. Specify 'CheckHessian' as true to verify optimality of the solution or to determine if the model is overparameterized in the number of covariance parameters.

32-1813

32

Functions

If you specify 'FitMethod' as 'MPL' or 'REMPL', then the covariance of the fixed effects and the covariance parameters is based on the fitted linear mixed-effects model from the final pseudo likelihood iteration. Example: 'CheckHessian',true CovarianceMethod — Method to compute covariance of estimated parameters 'conditional' (default) | 'JointHessian' Method to compute covariance of estimated parameters, specified as the comma-separated pair consisting of 'CovarianceMethod' and either 'conditional' or 'JointHessian'. If you specify 'conditional', then fitglme computes a fast approximation to the covariance of fixed effects given the estimated covariance parameters. It does not compute the covariance of covariance parameters. If you specify 'JointHessian', then fitglme computes the joint covariance of fixed effects and covariance parameters via the observed information matrix using the Laplacian loglikelihood. If you specify 'FitMethod' as 'MPL' or 'REMPL', then the covariance of the fixed effects and the covariance parameters is based on the fitted linear mixed-effects model from the final pseudo likelihood iteration. Example: 'CovarianceMethod','JointHessian' CovariancePattern — Pattern of covariance matrix 'FullCholesky' | 'Isotropic' | 'Full' | 'Diagonal' | 'CompSymm' | square symmetric logical matrix | string array | cell array of character vectors or logical matrices Pattern of the covariance matrix of the random effects, specified as the comma-separated pair consisting of 'CovariancePattern' and 'FullCholesky', 'Isotropic', 'Full', 'Diagonal', 'CompSymm', a square symmetric logical matrix, a string array, or a cell array containing character vectors or logical matrices. If there are R random-effects terms, then the value of 'CovariancePattern' must be a string array or cell array of length R, where each element r of the array specifies the pattern of the covariance matrix of the random-effects vector associated with the rth random-effects term. The options for each element follow.

32-1814

Value

Description

'FullCholesky'

Full covariance matrix using the Cholesky parameterization. fitglme estimates all elements of the covariance matrix.

fitglme

Value

Description

'Isotropic'

Diagonal covariance matrix with equal variances. That is, off-diagonal elements of the covariance matrix are constrained to be 0, and the diagonal elements are constrained to be equal. For example, if there are three random-effects terms with an isotropic covariance structure, this covariance matrix looks like σb2 0 0 0 σb2 0 0 0 σb2 where σ21 is the common variance of the randomeffects terms.

'Full'

Full covariance matrix, using the log-Cholesky parameterization. fitlme estimates all elements of the covariance matrix.

'Diagonal'

Diagonal covariance matrix. That is, off-diagonal elements of the covariance matrix are constrained to be 0. 2 σb1 0

0

2 0 σb2 0

0 'CompSymm'

2 0 σb3

Compound symmetry structure. That is, common variance along diagonals and equal correlation between all random effects. For example, if there are three random-effects terms with a covariance matrix having a compound symmetry structure, this covariance matrix looks like 2 σb1

σb1, b2

σb1, b2 σb1, b2 2 σb1

σb1, b2 σb1, b2

σb1, b2 2 σb1

where σ2b1 is the common variance of the random-effects terms and σb1,b2 is the common covariance between any two random-effects term . PAT

Square symmetric logical matrix. If 'CovariancePattern' is defined by the matrix PAT, and if PAT(a,b) = false, then the (a,b) element of the corresponding covariance matrix is constrained to be 0.

32-1815

32

Functions

For scalar random-effects terms, the default is 'Isotropic'. Otherwise, the default is 'FullCholesky'. Example: 'CovariancePattern','Diagonal' Example: 'CovariancePattern',{'Full','Diagonal'} Data Types: char | string | logical | cell DispersionFlag — Indicator to compute dispersion parameter false for 'binomial' and 'poisson' distributions (default) | true Indicator to compute dispersion parameter for 'binomial' and 'poisson' distributions, specified as the comma-separated pair consisting of 'DispersionFlag' and one of the following. Value

Description

true

Estimate a dispersion parameter when computing standard errors

false

Use the theoretical value of 1.0 when computing standard errors

'DispersionFlag' only applies if 'FitMethod' is 'MPL' or 'REMPL'. The fitting function always estimates the dispersion for other distributions. Example: 'DispersionFlag',true Distribution — Distribution of the response variable 'Normal' (default) | 'Binomial' | 'Poisson' | 'Gamma' | 'InverseGaussian' Distribution of the response variable, specified as the comma-separated pair consisting of 'Distribution' and one of the following. Value

Description

'Normal'

Normal distribution

'Binomial'

Binomial distribution

'Poisson'

Poisson distribution

'Gamma'

Gamma distribution

'InverseGaussian'

Inverse Gaussian distribution

Example: 'Distribution','Binomial' DummyVarCoding — Coding to use for dummy variables 'reference' (default) | 'effects' | 'full' Coding to use for dummy variables created from the categorical variables, specified as the commaseparated pair consisting of 'DummyVarCoding' and one of the following.

32-1816

Value

Description

'reference'

Default. Coefficient for first category set to 0.

'effects'

Coefficients sum to 0.

fitglme

Value

Description

'full'

One dummy variable for each category.

Example: 'DummyVarCoding','effects' EBMethod — Method used to approximate empirical Bayes estimates of random effects 'Auto' (default) | 'LineSearchNewton' | 'TrustRegion2D' | 'fsolve' Method used to approximate empirical Bayes estimates of random effects, specified as the commaseparated pair consisting of 'EBMethod' and one of the following. • 'Auto' • 'LineSearchNewton' • 'TrustRegion2D' • 'fsolve' 'Auto' is similar to 'LineSearchNewton' but uses a different convergence criterion and does not display iterative progress. 'Auto' and 'LineSearchNewton' may fail for non-canonical link functions. For non-canonical link functions, 'TrustRegion2D' or 'fsolve' are recommended. You must have Optimization Toolbox to use 'fsolve'. Example: 'EBMethod','LineSearchNewton' EBOptions — Options for empirical Bayes optimization structure Options for empirical Bayes optimization, specified as the comma-separated pair consisting of 'EBOptions' and a structure containing the following. Value

Description

'TolFun'

Relative tolerance on the gradient norm. Default is 1e-6.

'TolX'

Absolute tolerance on the step size. Default is 1e-8.

'MaxIter'

Maximum number of iterations. Default is 100.

'Display'

'off', 'iter', or 'final'. Default is 'off'.

If EBMethod is 'Auto' and 'FitMethod' is 'Laplace', TolFun is the relative tolerance on the linear predictor of the model, and the 'Display' option does not apply. If 'EBMethod' is 'fsolve', then 'EBOptions' must be specified as an object created by optimoptions('fsolve'). Data Types: struct Exclude — Indices for rows to exclude use all rows without NaNs (default) | vector of integer or logical values Indices for rows to exclude from the generalized linear mixed-effects model in the data, specified as the comma-separated pair consisting of 'Exclude' and a vector of integer or logical values. For example, you can exclude the 13th and 67th rows from the fit as follows. 32-1817

32

Functions

Example: 'Exclude',[13,67] Data Types: single | double | logical FitMethod — Method for estimating model parameters 'MPL' (default) | 'REMPL' | 'Laplace' | 'ApproximateLaplace Method for estimating model parameters, specified as the comma-separated pair consisting of 'FitMethod' and one of the following. • 'MPL' — Maximum pseudo likelihood • 'REMPL' — Restricted maximum pseudo likelihood • 'Laplace' — Maximum likelihood using Laplace approximation • 'ApproximateLaplace' — Maximum likelihood using approximate Laplace approximation with fixed effects profiled out Example: 'FitMethod','REMPL' InitPLIterations — Initial number of pseudo likelihood iterations 10 (default) | integer value in the range [1,∞) Initial number of pseudo likelihood iterations used to initialize parameters for ApproximateLaplace and Laplace fit methods, specified as the comma-separated pair consisting of 'InitPLIterations' and an integer value greater than or equal to 1. Data Types: single | double Link — Link function 'identity' | 'log' | 'logit' | 'probit' | 'comploglog' | 'reciprocal' | scalar value | structure Link function, specified as the comma-separated pair consisting of 'Link' and one of the following. Value

Description

'identity'

g(mu) = mu This is the default for the normal distribution.

'log'

g(mu) = log(mu) This is the default for the Poisson distribution.

'logit'

g(mu) = log(mu/(1-mu)) This is the default for the binomial distribution.

32-1818

'loglog'

g(mu) = log(-log(mu))

'probit'

g(mu) = norminv(mu)

'comploglog'

g(mu) = log(-log(1-mu))

'reciprocal'

g(mu) = mu.^(-1)

Scalar value P

g(mu) = mu.^P

fitglme

Value

Description

Structure S

A structure containing four fields whose values are function handles with the following names: • S.Link — Link function • S.Derivative — Derivative • S.SecondDerivative — Second derivative • S.Inverse — Inverse of link Specification of S.SecondDerivative can be omitted if FitMethod is MPL or REMPL, or if S is the canonical link for the specified distribution.

The default link function used by fitglme is the canonical link that depends on the distribution of the response. Response Distribution

Canonical Link Function

'Normal'

'identity'

'Binomial'

'logit'

'Poisson'

'log'

'Gamma'

-1

'InverseGaussian'

-2

Example: 'Link','log' Data Types: char | string | single | double | struct MuStart — Starting value for conditional mean scalar value Starting value for conditional mean, specified as the comma-separated pair consisting of 'MuStart' and a scalar value. Valid values are as follows. Response Distribution

Valid Values

'Normal'

(-Inf,Inf)

'Binomial'

(0,1)

'Poisson'

(0,Inf)

'Gamma'

(0,Inf)

'InverseGaussian'

(0,Inf)

Data Types: single | double Offset — Offset zeros(n,1) (default) | n-by-1 vector of scalar values Offset, specified as the comma-separated pair consisting of 'Offset' and an n-by-1 vector of scalar values, where n is the length of the response vector. You can also specify the variable name of an nby-1 vector of scalar values. 'Offset' is used as an additional predictor that has a coefficient value fixed at 1.0. 32-1819

32

Functions

Data Types: single | double Optimizer — Optimization algorithm 'quasinewton' (default) | 'fminsearch' | 'fminunc' Optimization algorithm, specified as the comma-separated pair consisting of 'Optimizer' and either of the following. Value

Description

'quasinewton'

Uses a trust region based quasi-Newton optimizer. You can change the options of the algorithm using statset('fitglme'). If you do not specify the options, then fitglme uses the default options of statset('fitglme').

'fminsearch'

Uses a derivative-free Nelder-Mead method. You can change the options of the algorithm using optimset('fminsearch'). If you do not specify the options, then fitglme uses the default options of optimset('fminsearch').

'fminunc'

Uses a line search-based quasi-Newton method. You must have Optimization Toolbox to specify this option. You can change the options of the algorithm using optimoptions('fminunc'). If you do not specify the options, then fitglme uses the default options of optimoptions('fminunc') with 'Algorithm' set to 'quasi-newton'.

Example: 'Optimizer','fminsearch' OptimizerOptions — Options for optimization algorithm structure returned by statset | structure returned by optimset | object returned by optimoptions Options for the optimization algorithm, specified as the comma-separated pair consisting of 'OptimizerOptions' and a structure returned by statset('fitglme'), a structure created by optimset('fminsearch'), or an object returned by optimoptions('fminunc'). • If 'Optimizer' is 'fminsearch', then use optimset('fminsearch') to change the options of the algorithm. If 'Optimizer' is 'fminsearch' and you do not supply 'OptimizerOptions', then the defaults used in fitglme are the default options created by optimset('fminsearch'). • If 'Optimizer' is 'fminunc', then use optimoptions('fminunc') to change the options of the optimization algorithm. See optimoptions for the options 'fminunc' uses. If 'Optimizer' is 'fminunc' and you do not supply 'OptimizerOptions', then the defaults used in fitglme are the default options created by optimoptions('fminunc') with 'Algorithm' set to 'quasi-newton'. • If 'Optimizer' is 'quasinewton', then use statset('fitglme') to change the optimization parameters. If 'Optimizer' is 'quasinewton' and you do not change the optimization parameters using statset, then fitglme uses the default options created by statset('fitglme'). 32-1820

fitglme

The 'quasinewton' optimizer uses the following fields in the structure created by statset('fitglme'). TolFun — Relative tolerance on gradient of objective function 1e-6 (default) | positive scalar value Relative tolerance on the gradient of the objective function, specified as a positive scalar value. TolX — Absolute tolerance on step size 1e-12 (default) | positive scalar value Absolute tolerance on the step size, specified as a positive scalar value. MaxIter — Maximum number of iterations allowed 10000 (default) | positive scalar value Maximum number of iterations allowed, specified as a positive scalar value. Display — Level of display 'off' (default) | 'iter' | 'final' Level of display, specified as one of 'off', 'iter', or 'final'. PLIterations — Maximum number of pseudo likelihood iterations 100 (default) | positive integer value Maximum number of pseudo likelihood (PL) iterations, specified as the comma-separated pair consisting of 'PLIterations' and a positive integer value. PL is used for fitting the model if 'FitMethod' is 'MPL' or 'REMPL'. For other 'FitMethod' values, PL iterations are used to initialize parameters for subsequent optimization. Example: 'PLIterations',200 Data Types: single | double PLTolerance — Relative tolerance factor for pseudo likelihood iterations 1e–08 (default) | positive scalar value Relative tolerance factor for pseudo likelihood iterations, specified as the comma-separated pair consisting of 'PLTolerance' and a positive scalar value. Example: 'PLTolerance',1e-06 Data Types: single | double StartMethod — Method to start iterative optimization 'default' (default) | 'random' Method to start iterative optimization, specified as the comma-separated pair consisting of 'StartMethod' and either of the following. Value

Description

'default'

An internally defined default value

'random'

A random initial value

Example: 'StartMethod','random' 32-1821

32

Functions

UseSequentialFitting — Initial fitting type false (default) | true , specified as the comma-separated pair consisting of 'UseSequentialFitting' and either false or true. If 'UseSequentialFitting' is false, all maximum likelihood methods are initialized using one or more pseudo likelihood iterations. If 'UseSequentialFitting' is true, the initial values from pseudo likelihood iterations are refined using 'ApproximateLaplace' for 'Laplace' fitting. Example: 'UseSequentialFitting',true Verbose — Indicator to display optimization process on screen 0 (default) | 1 | 2 Indicator to display the optimization process on screen, specified as the comma-separated pair consisting of 'Verbose' and 0, 1, or 2. If 'Verbose' is specified as 1 or 2, then fitglme displays the progress of the iterative model-fitting process. Specifying 'Verbose' as 2 displays iterative optimization information from the individual pseudo likelihood iterations. Specifying 'Verbose' as 1 omits this display. The setting for 'Verbose' overrides the field 'Display' in 'OptimizerOptions'. Example: 'Verbose',1 Weights — Observation weights vector of nonnegative scalar values Observation weights, specified as the comma-separated pair consisting of 'Weights' and an n-by-1 vector of nonnegative scalar values, where n is the number of observations. If the response distribution is binomial or Poisson, then 'Weights' must be a vector of positive integers. Data Types: single | double

Output Arguments glme — Generalized linear mixed-effects model GeneralizedLinearMixedModel object Generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. For properties and methods of this object, see GeneralizedLinearMixedModel.

More About Formula In general, a formula for model specification is a character vector or string scalar of the form 'y ~ terms'. For the generalized linear mixed-effects models, this formula is in the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)', where fixed and random contain the fixed-effects and the random-effects terms. Suppose a table tbl contains the following: • A response variable, y • Predictor variables, Xj, which can be continuous or grouping variables 32-1822

fitglme

• Grouping variables, g1, g2, ..., gR, where the grouping variables in Xj and gr can be categorical, logical, character arrays, string arrays, or cell arrays of character vectors. Then, in a formula of the form, 'y ~ fixed + (random1|g1) + ... + (randomR|gR)', the term fixed corresponds to a specification of the fixed-effects design matrix X, random1 is a specification of the random-effects design matrix Z1 corresponding to grouping variable g1, and similarly randomR is a specification of the random-effects design matrix ZR corresponding to grouping variable gR. You can express the fixed and random terms using Wilkinson notation. Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors. Wilkinson Notation

Factors in Standard Notation

1

Constant (intercept) term

X^k, where k is a positive integer

X, X2, ..., Xk

X1 + X2

X1, X2

X1*X2

X1, X2, X1.*X2 (elementwise multiplication of X1 and X2)

X1:X2

X1.*X2 only

- X2

Do not include X2

X1*X2 + X3

X1, X2, X3, X1*X2

X1 + X2 + X3 + X1:X2

X1, X2, X3, X1*X2

X1*X2*X3 - X1:X2:X3

X1, X2, X3, X1*X2, X1*X3, X2*X3

X1*(X2 + X3)

X1, X2, X3, X1*X2, X1*X3

Statistics and Machine Learning Toolbox notation always includes a constant term unless you explicitly remove the term using -1. Here are some examples for generalized linear mixed-effects model specification. Examples: Formula

Description

'y ~ X1 + X2'

Fixed effects for the intercept, X1 and X2. This is equivalent to 'y ~ 1 + X1 + X2'.

'y ~ -1 + X1 + X2'

No intercept and fixed effects for X1 and X2. The implicit intercept term is suppressed by including -1.

'y ~ 1 + (1 | g1)'

Fixed effects for the intercept plus random effect for the intercept for each level of the grouping variable g1.

'y ~ X1 + (1 | g1)'

Random intercept model with a fixed slope.

'y ~ X1 + (X1 | g1)'

Random intercept and slope, with possible correlation between them. This is equivalent to 'y ~ 1 + X1 + (1 + X1|g1)'.

32-1823

32

Functions

Formula

Description

'y ~ X1 + (1 | g1) + (-1 + X1 | g1)'

Independent random effects terms for intercept and slope.

'y ~ 1 + (1 | g1) + (1 | g2) + (1 | g1:g2)'

Random intercept model with independent main effects for g1 and g2, plus an independent interaction effect.

See Also GeneralizedLinearMixedModel Topics “Fit a Generalized Linear Mixed-Effects Model” on page 12-57 “Generalized Linear Mixed-Effects Models” on page 12-48 Introduced in R2014b

32-1824

fitgmdist

fitgmdist Fit Gaussian mixture model to data

Syntax GMModel = fitgmdist(X,k) GMModel = fitgmdist(X,k,Name,Value)

Description GMModel = fitgmdist(X,k) returns a Gaussian mixture distribution model (GMModel) with k components fitted to data (X). GMModel = fitgmdist(X,k,Name,Value) returns a Gaussian mixture distribution model with additional options specified by one or more Name,Value pair arguments. For example, you can specify a regularization value or the covariance type.

Examples Cluster Data Using a Gaussian Mixture Model Generate data from a mixture of two bivariate Gaussian distributions. mu1 = [1 2]; Sigma1 = [2 0; 0 0.5]; mu2 = [-3 -5]; Sigma2 = [1 0;0 1]; rng(1); % For reproducibility X = [mvnrnd(mu1,Sigma1,1000);mvnrnd(mu2,Sigma2,1000)];

Fit a Gaussian mixture model. Specify that there are two components. GMModel = fitgmdist(X,2);

Plot the data over the fitted Gaussian mixture model contours. figure y = [zeros(1000,1);ones(1000,1)]; h = gscatter(X(:,1),X(:,2),y); hold on gmPDF = @(x1,x2)reshape(pdf(GMModel,[x1(:) x2(:)]),size(x1)); g = gca; fcontour(gmPDF,[g.XLim g.YLim]) title('{\bf Scatter Plot and Fitted Gaussian Mixture Contours}') legend(h,'Model 0','Model1') hold off

32-1825

32

Functions

Regularize Gaussian Mixture Model Estimation Generate data from a mixture of two bivariate Gaussian distributions. Create a third predictor that is the sum of the first and second predictors. mu1 = [1 2]; Sigma1 = [1 0; 0 1]; mu2 = [3 4]; Sigma2 = [0.5 0; 0 0.5]; rng(3); % For reproducibility X1 = [mvnrnd(mu1,Sigma1,100);mvnrnd(mu2,Sigma2,100)]; X = [X1,X1(:,1)+X1(:,2)];

The columns of X are linearly dependent. This can cause ill-conditioned covariance estimates. Fit a Gaussian mixture model to the data. You can use try / catch statements to help manage error messages. rng(1); % Reset seed for common start values try GMModel = fitgmdist(X,2) catch exception disp('There was an error fitting the Gaussian mixture model') error = exception.message end

32-1826

fitgmdist

There was an error fitting the Gaussian mixture model error = 'Ill-conditioned covariance created at iteration 2.'

The covariance estimates are ill-conditioned. Consequently, optimization stops and an error appears. Fit a Gaussian mixture model again, but use regularization. rng(3); % Reset seed for common start values GMModel = fitgmdist(X,2,'RegularizationValue',0.1) GMModel = Gaussian mixture distribution with 2 components in 3 dimensions Component 1: Mixing proportion: 0.536725 Mean: 2.8831 3.9506 6.8338 Component 2: Mixing proportion: 0.463275 Mean: 0.8813 1.9758

2.8571

In this case, the algorithm converges to a solution due to regularization.

Select the Number of Gaussian Mixture Model Components Using PCA Gaussian mixture models require that you specify a number of components before being fit to data. For many applications, it might be difficult to know the appropriate number of components. This example shows how to explore the data, and try to get an initial guess at the number of components using principal component analysis. Load Fisher's iris data set. load fisheriris classes = unique(species) classes = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }

The data set contains three classes of iris species. The analysis proceeds as if this is unknown. Use principal component analysis to reduce the dimension of the data to two dimensions for visualization. [~,score] = pca(meas,'NumComponents',2);

Fit three Gaussian mixture models to the data by specifying 1, 2, and 3 components. Increase the number of optimization iterations to 1000. Use dot notation to store the final parameter estimates. By default, the software fits full and different covariances for each component. GMModels = cell(3,1); % Preallocation options = statset('MaxIter',1000);

32-1827

32

Functions

rng(1); % For reproducibility for j = 1:3 GMModels{j} = fitgmdist(score,j,'Options',options); fprintf('\n GM Mean for %i Component(s)\n',j) Mu = GMModels{j}.mu end GM Mean for 1 Component(s) Mu = 1×2 10-14 × -0.2743

-0.0870

GM Mean for 2 Component(s) Mu = 2×2 1.3212 -2.6424

-0.0954 0.1909

GM Mean for 3 Component(s) Mu = 3×2 0.4856 1.4484 -2.6424

-0.1287 -0.0904 0.1909

GMModels is a cell array containing three, fitted gmdistribution models. The means in the three component models are different, suggesting that the model distinguishes among the three iris species. Plot the scores over the fitted Gaussian mixture model contours. Since the data set includes labels, use gscatter to distinguish between the true number of components. figure for j = 1:3 subplot(2,2,j) h1 = gscatter(score(:,1),score(:,2),species); h = gca; hold on gmPDF = @(x1,x2)arrayfun(@(x1,x2)pdf(GMModels{j},[x1(:) x2(:)]),x1,x2); fcontour(gmPDF,[h.XLim h.YLim],'MeshDensity',100) title(sprintf('GM Model - %i Component(s)',j)); xlabel('1st principal component'); ylabel('2nd principal component'); if(j ~= 3) legend off; end hold off end g = legend(h1); g.Position = [0.7 0.25 0.1 0.1];

32-1828

fitgmdist

The three-component Gaussian mixture model, in conjunction with PCA, looks like it distinguishes between the three iris species. There are other options you can use to help select the appropriate number of components for a Gaussian mixture model. For example, • Compare multiple models with varying numbers of components using information criteria, e.g., AIC or BIC. • Estimate the number of clusters using evalclusters, which supports, the Calinski-Harabasz criterion and the gap statistic, or other criteria.

Determine the Best Gaussian Mixture Fit Using AIC Gaussian mixture models require that you specify a number of components before being fit to data. For many applications, it might be difficult to know the appropriate number of components. This example uses the AIC fit statistic to help you choose the best fitting Gaussian mixture model over varying numbers of components. Generate data from a mixture of two bivariate Gaussian distributions. mu1 = [1 1]; Sigma1 = [0.5 0; 0 0.5]; mu2 = [2 4];

32-1829

32

Functions

Sigma2 = [0.2 0; 0 0.2]; rng(1); X = [mvnrnd(mu1,Sigma1,1000);mvnrnd(mu2,Sigma2,1000)]; plot(X(:,1),X(:,2),'ko') title('Scatter Plot') xlim([min(X(:)) max(X(:))]) % Make axes have the same scale ylim([min(X(:)) max(X(:))])

Supposing that you do not know the underlying parameter values, the scatter plots suggests: • There are two components. • The variances between the clusters are different. • The variance within the clusters is the same. • There is no covariance within the clusters. Fit a two-component Gaussian mixture model. Based on the scatter plot inspection, specify that the covariance matrices are diagonal. Print the final iteration and loglikelihood statistic to the Command Window by passing a statset structure as the value of the Options name-value pair argument. options = statset('Display','final'); GMModel = fitgmdist(X,2,'CovarianceType','diagonal','Options',options); 11 iterations, log-likelihood = -4787.38

GMModel is a fitted gmdistribution model. 32-1830

fitgmdist

Examine the AIC over varying numbers of components. AIC = zeros(1,4); GMModels = cell(1,4); options = statset('MaxIter',500); for k = 1:4 GMModels{k} = fitgmdist(X,k,'Options',options,'CovarianceType','diagonal'); AIC(k)= GMModels{k}.AIC; end [minAIC,numComponents] = min(AIC); numComponents numComponents = 2 BestModel = GMModels{numComponents} BestModel = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.501719 Mean: 1.9824 4.0013 Component 2: Mixing proportion: 0.498281 Mean: 0.9880 1.0511

The smallest AIC occurs when the software fits the two-component Gaussian mixture model.

Set Initial Values When Fitting Gaussian Mixture Models Gaussian mixture model parameter estimates might vary with different initial values. This example shows how to control initial values when you fit Gaussian mixture models using fitgmdist. Load Fisher's iris data set. Use the petal lengths and widths as predictors. load fisheriris X = meas(:,3:4);

Fit a Gaussian mixture model to the data using default initial values. There are three iris species, so specify k = 3 components. rng(10); % For reproducibility GMModel1 = fitgmdist(X,3);

By default, the software: 1

Implements the “k-means++ Algorithm for Initialization” on page 32-1838 to choose k = 3 initial cluster centers.

2

Sets the initial covariance matrices as diagonal, where element (j, j) is the variance of X(:,j).

3

Treats the initial mixing proportions as uniform.

Fit a Gaussian mixture model by connecting each observation to its label. 32-1831

32

Functions

y = ones(size(X,1),1); y(strcmp(species,'setosa')) = 2; y(strcmp(species,'virginica')) = 3; GMModel2 = fitgmdist(X,3,'Start',y);

Fit a Gaussian mixture model by explicitly specifying the initial means, covariance matrices, and mixing proportions. Mu = [1 1; 2 2; 3 3]; Sigma(:,:,1) = [1 1; 1 2]; Sigma(:,:,2) = 2*[1 1; 1 2]; Sigma(:,:,3) = 3*[1 1; 1 2]; PComponents = [1/2,1/4,1/4]; S = struct('mu',Mu,'Sigma',Sigma,'ComponentProportion',PComponents); GMModel3 = fitgmdist(X,3,'Start',S);

Use gscatter to plot a scatter diagram that distinguishes between the iris species. For each model, plot the fitted Gaussian mixture model contours. figure subplot(2,2,1) h = gscatter(X(:,1),X(:,2),species,[],'o',4); haxis = gca; xlim = haxis.XLim; ylim = haxis.YLim; d = (max([xlim ylim])-min([xlim ylim]))/1000; [X1Grid,X2Grid] = meshgrid(xlim(1):d:xlim(2),ylim(1):d:ylim(2)); hold on contour(X1Grid,X2Grid,reshape(pdf(GMModel1,[X1Grid(:) X2Grid(:)]),... size(X1Grid,1),size(X1Grid,2)),20) uistack(h,'top') title('{\bf Random Initial Values}'); xlabel('Sepal length'); ylabel('Sepal width'); legend off; hold off subplot(2,2,2) h = gscatter(X(:,1),X(:,2),species,[],'o',4); hold on contour(X1Grid,X2Grid,reshape(pdf(GMModel2,[X1Grid(:) X2Grid(:)]),... size(X1Grid,1),size(X1Grid,2)),20) uistack(h,'top') title('{\bf Initial Values from Labels}'); xlabel('Sepal length'); ylabel('Sepal width'); legend off hold off subplot(2,2,3) h = gscatter(X(:,1),X(:,2),species,[],'o',4); hold on contour(X1Grid,X2Grid,reshape(pdf(GMModel3,[X1Grid(:) X2Grid(:)]),... size(X1Grid,1),size(X1Grid,2)),20) uistack(h,'top') title('{\bf Initial Values from the Structure}'); xlabel('Sepal length'); ylabel('Sepal width');

32-1832

fitgmdist

legend('Location',[0.7,0.25,0.1,0.1]); hold off

According to the contours, GMModel2 seems to suggest a slight trimodality, while the others suggest bimodal distributions. Display the estimated component means. table(GMModel1.mu,GMModel2.mu,GMModel3.mu,'VariableNames',... {'Model1','Model2','Model3'}) ans=3×3 table Model1 _________________ 5.2115 1.461 4.6829

2.0119 0.24423 1.4429

Model2 ________________

Model3 ________________

4.2857 1.462 5.5507

1.4604 4.7509 5.0158

1.3339 0.246 2.0316

0.2429 1.4629 1.8592

GMModel2 seems to distinguish between the iris species the best.

Input Arguments X — Data numeric matrix 32-1833

32

Functions

Data to which the Gaussian mixture model is fit, specified as a numeric matrix. The rows of X correspond to observations, and the columns of X correspond to variables. The number of observations must be larger than each of the following: the number of variables and the number of components. NaNs indicate missing values. The software removes rows of X containing at least one NaN before fitting, which decreases the effective sample size. Data Types: single | double k — Number of components positive integer Number of components to use when fitting Gaussian mixture model, specified as a positive integer. For example, if you specify k = 3, then the software fits a Gaussian mixture model with three distinct means, covariances matrices, and component proportions to the data (X). Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'RegularizationValue',0.1,'CovarianceType','diagonal' specifies a regularization parameter value of 0.1 and to fit diagonal covariance matrices. CovarianceType — Type of covariance matrix 'full' (default) | 'diagonal' Type of covariance matrix to fit to the data, specified as the comma-separated pair consisting of 'CovarianceType' and either 'diagonal' or 'full'. If you set 'diagonal', then the software fits diagonal covariance matrices. In this case, the software estimates k*d covariance parameters, where d is the number of columns in X (i.e., d = size(X,2)). Otherwise, the software fits full covariance matrices. In this case, the software estimates k*d*(d +1)/2 covariance parameters. Example: 'CovarianceType','diagonal' Options — Iterative EM algorithm optimization options statset options structure Iterative EM algorithm optimization options, specified as the comma-separated pair consisting of 'Options' and a statset options structure. This table describes the available name-value pair arguments.

32-1834

fitgmdist

Name

Value

'Display'

'final': Display the final output. 'iter': Display iterative output to the Command Window for some functions; otherwise display the final output. 'off': Do not display optimization information.

'MaxIter'

Positive integer indicating the maximum number of iterations allowed. The default is 100

'TolFun'

Positive scalar indicating the termination tolerance for the loglikelihood function value. The default is 1e-6.

Example: 'Options',statset('Display','final','MaxIter',1500,'TolFun',1e-5) ProbabilityTolerance — Tolerance for posterior probabilities 1e-8 (default) | nonnegative scalar value in range [0,1e-6] Tolerance for posterior probabilities, specified as the comma-separated pair consisting of ProbabilityTolerance and a nonnegative scalar value in the range [0,1e-6]. In each iteration, after the estimation of posterior probabilities, fitgmdist sets any posterior probability that is not larger than the tolerance value to zero. Using a nonzero tolerance might speed up fitgmdist. Example: 'ProbabilityTolerance',0.0000025 Data Types: single | double RegularizationValue — Regularization parameter value 0 (default) | nonnegative scalar Regularization parameter value, specified as the comma-separated pair consisting of 'RegularizationValue' and a nonnegative scalar. Set RegularizationValue to a small positive scalar to ensure that the estimated covariance matrices are positive definite. Example: 'RegularizationValue',0.01 Data Types: single | double Replicates — Number of times to repeat EM algorithm 1 (default) | positive integer Number of times to repeat the EM algorithm using a new set of initial values, specified as the commaseparated pair consisting of 'Replicates' and a positive integer. If Replicates is greater than 1, then: • The name-value pair argument Start must be plus (the default) or randSample. • GMModel is the fit with the largest loglikelihood. Example: 'Replicates',10 32-1835

32

Functions

Data Types: single | double SharedCovariance — Flag indicating whether all covariance matrices are identical logical false (default) | logical true Flag indicating whether all covariance matrices are identical (i.e., fit a pooled estimate), specified as the comma-separated pair consisting of 'SharedCovariance' and either logical value false or true. If SharedCovariance is true, then all k covariance matrices are equal, and the number of covariance parameters is scaled down by a factor of k. Start — Initial value setting method 'plus' (default) | 'randSample' | vector of integers | structure array Initial value setting method, specified as the comma-separated pair consisting of 'Start' and 'randSample', 'plus', a vector of integers, or a structure array. The value of Start determines the initial values required by the optimization routine for each Gaussian component parameter — mean, covariance, and mixing proportion. This table summarizes the available options. Value

Description

'randSample' The software selects k observations from X at random as initial component means. The mixing proportions are uniform. The initial covariance matrices for all components are diagonal, where the element j on the diagonal is the variance of X(:,j).

32-1836

'plus'

The software selects k observations from X using the k-means++ algorithm on page 32-1838. The initial mixing proportions are uniform. The initial covariance matrices for all components are diagonal, where the element j on the diagonal is the variance of X(:,j).

Vector of integers

A vector of length n (the number of observations) containing an initial guess of the component index for each point. That is, each element is an integer from 1 to k, which corresponds to a component. The software collects all observations corresponding to the same component, computes means, covariances, and mixing proportions for each, and sets the initial values to these statistics.

fitgmdist

Value

Description

Structure array Suppose that there are d variables (i.e., d = size(X,2)). The structure array, e.g., S, must have three fields: • S.mu: A k-by-d matrix specifying the initial mean of each component • S.Sigma: A numeric array specifying the covariance matrix of each component. Sigma is one of the following: • A d-by-d-by-k array. Sigma(:,:,j) is the initial covariance matrix of component j. • A 1-by-d-by-k array. diag(Sigma(:,:,j)) is the initial covariance matrix of component j. • A d-by-d matrix. Sigma is the initial covariance matrix for all components. • A 1-by-d vector. diag(Sigma) is the initial covariance matrix for all components. • S.ComponentProportion: A 1-by-k vector of scalars specifying the initial mixing proportions of each component. The default is uniform. Example: 'Start',ones(n,1) Data Types: single | double | char | string | struct

Output Arguments GMModel — Fitted Gaussian mixture model gmdistribution model Fitted Gaussian mixture model, returned as a gmdistribution model. Access properties of GMModel using dot notation. For example, display the AIC by entering GMModel.AIC.

Tips fitgmdist might: • Converge to a solution where one or more of the components has an ill-conditioned or singular covariance matrix. The following issues might result in an ill-conditioned covariance matrix: • The number of dimensions of your data is relatively high and there are not enough observations. • Some of the predictors (variables) of your data are highly correlated. • Some or all the features are discrete. • You tried to fit the data to too many components. In general, you can avoid getting ill-conditioned covariance matrices by using one of the following precautions: 32-1837

32

Functions

• Preprocess your data to remove correlated features. • Set 'SharedCovariance' to true to use an equal covariance matrix for every component. • Set 'CovarianceType' to 'diagonal'. • Use 'RegularizationValue' to add a very small positive number to the diagonal of every covariance matrix. • Try another set of initial values. • Pass through an intermediate step where one or more of the components has an ill-conditioned covariance matrix. Try another set of initial values to avoid this issue without altering your data or model.

Algorithms Gaussian Mixture Model Likelihood Optimization The software optimizes the Gaussian mixture model likelihood using the iterative ExpectationMaximization (EM) algorithm. fitgmdist fits GMMs to data using the iterative Expectation-Maximization (EM) algorithm. Using initial values for component means, covariance matrices, and mixing proportions, the EM algorithm proceeds using these steps. 1

For each observation, the algorithm computes posterior probabilities of component memberships. You can think of the result as an n-by-k matrix, where element (i,j) contains the posterior probability that observation i is from component j. This is the E-step of the EM algorithm.

2

Using the component-membership posterior probabilities as weights, the algorithm estimates the component means, covariance matrices, and mixing proportions by applying maximum likelihood. This is the M-step of the EM algorithm.

The algorithm iterates over these steps until convergence. The likelihood surface is complex, and the algorithm might converge to a local optimum. Also, the resulting local optimum might depend on the initial conditions. fitgmdist has several options for choosing initial conditions, including random component assignments for the observations and the k-means ++ algorithm. k-means++ Algorithm for Initialization The k-means++ algorithm uses an heuristic to find centroid seeds for k-means clustering. fitgmdist can apply the same principle to initialize the EM algorithm by using the k-means++ algorithm to select the initial parameter values for a fitted Gaussian mixture model. The k-means++ algorithm assumes the number of clusters is k and chooses the initial parameter values as follows. 1 2

Select the component mixture probability to be the uniform probability pi =

1 , where i = 1, ..., k. k

Select the covariance matrices to be diagonal and identical, where σi = diag a1, a2, …, ak and a j = var X j .

3

Select the first initial component center μ1 uniformly from all data points in X.

4

To choose center j: a

32-1838

Compute the Mahalanobis distances from each observation to each centroid, and assign each observation to its closest centroid.

fitgmdist

b

For m = 1,...,n and p = 1,...,j – 1, select centroid j at random from X with probability 2

d xm, μp



h; xh ∈ Μp

2

d xh, μp

where d xm, μp is the distance between observation m and μp, and Mp is the set of all observations closest to centroid μp and xm belongs to Mp. That is, select each subsequent center with a probability proportional to the distance from itself to the closest center that you already chose. 5

Repeat step 4 until k centroids are chosen.

References [1] McLachlan, G., and D. Peel. Finite Mixture Models. Hoboken, NJ: John Wiley & Sons, Inc., 2000.

See Also cluster | gmdistribution Topics “Tune Gaussian Mixture Models” on page 16-57 “Cluster Using Gaussian Mixture Model” on page 16-39 Introduced in R2014a

32-1839

32

Functions

fitlm Fit linear regression model

Syntax mdl mdl mdl mdl

= = = =

fitlm(tbl) fitlm(X,y) fitlm( ___ ,modelspec) fitlm( ___ ,Name,Value)

Description mdl = fitlm(tbl) returns a linear regression model fit to variables in the table or dataset array tbl. By default, fitlm takes the last variable as the response variable. mdl = fitlm(X,y) returns a linear regression model of the responses y, fit to the data matrix X. mdl = fitlm( ___ ,modelspec) defines the model specification using any of the input argument combinations in the previous syntaxes. mdl = fitlm( ___ ,Name,Value) specifies additional options using one or more name-value pair arguments. For example, you can specify which variables are categorical, perform robust regression, or use observation weights.

Examples Fit Linear Regression Using Data in Matrix Fit a linear regression model using a matrix input data set. Load the carsmall data set, a matrix input data set. load carsmall X = [Weight,Horsepower,Acceleration];

Fit a linear regression model by using fitlm. mdl = fitlm(X,MPG) mdl = Linear regression model: y ~ 1 + x1 + x2 + x3 Estimated Coefficients: Estimate __________ (Intercept) x1 x2

32-1840

47.977 -0.0065416 -0.042943

SE _________

tStat _________

pValue __________

3.8785 0.0011274 0.024313

12.37 -5.8023 -1.7663

4.8957e-21 9.8742e-08 0.08078

fitlm

x3

-0.011583

0.19333

-0.059913

0.95236

Number of observations: 93, Error degrees of freedom: 89 Root Mean Squared Error: 4.09 R-squared: 0.752, Adjusted R-Squared: 0.744 F-statistic vs. constant model: 90, p-value = 7.38e-27

The model display includes the model formula, estimated coefficients, and model summary statistics. The model formula in the display, y ~ 1 + x1 + x2 + x3, corresponds to y = β0 + β1 X1 + β2 X2 + β3 X3 + ϵ. The model display also shows the estimated coefficient information, which is stored in the Coefficients property. Display the Coefficients property. mdl.Coefficients ans=4×4 table

(Intercept) x1 x2 x3

Estimate __________

SE _________

tStat _________

pValue __________

47.977 -0.0065416 -0.042943 -0.011583

3.8785 0.0011274 0.024313 0.19333

12.37 -5.8023 -1.7663 -0.059913

4.8957e-21 9.8742e-08 0.08078 0.95236

The Coefficient property includes these columns: • Estimate — Coefficient estimates for each corresponding term in the model. For example, the estimate for the constant term (intercept) is 47.977. • SE — Standard error of the coefficients. • tStat — t-statistic for each coefficient to test the null hypothesis that the corresponding coefficient is zero against the alternative that it is different from zero, given the other predictors in the model. Note that tStat = Estimate/SE. For example, the t-statistic for the intercept is 47.977/3.8785 = 12.37. • pValue — p-value for the t-statistic of the hypothesis test that the corresponding coefficient is equal to zero or not. For example, the p-value of the t-statistic for x2 is greater than 0.05, so this term is not significant at the 5% significance level given the other terms in the model. The summary statistics of the model are: • Number of observations — Number of rows without any NaN values. For example, Number of observations is 93 because the MPG data vector has six NaN values and the Horsepower data vector has one NaN value for a different observation, where the number of rows in X and MPG is 100. • Error degrees of freedom — n – p, where n is the number of observations, and p is the number of coefficients in the model, including the intercept. For example, the model has four predictors, so the Error degrees of freedom is 93 – 4 = 89. • Root mean squared error — Square root of the mean squared error, which estimates the standard deviation of the error distribution.

32-1841

32

Functions

• R-squared and Adjusted R-squared — Coefficient of determination and adjusted coefficient of determination, respectively. For example, the R-squared value suggests that the model explains approximately 75% of the variability in the response variable MPG. • F-statistic vs. constant model — Test statistic for the F-test on the regression model, which tests whether the model fits significantly better than a degenerate model consisting of only a constant term. • p-value — p-value for the F-test on the model. For example, the model is significant with a pvalue of 7.3816e-27. You can find these statistics in the model properties (NumObservations, DFE, RMSE, and Rsquared) and by using the anova function. anova(mdl,'summary') ans=3×5 table

Total Model Residual

SumSq ______

DF __

MeanSq ______

F ______

pValue __________

6004.8 4516 1488.8

92 3 89

65.269 1505.3 16.728

89.987

7.3816e-27

Fit Linear Regression Using Data in Table Load the sample data. load carsmall

Store the variables in a table. tbl = table(Weight,Acceleration,MPG,'VariableNames',{'Weight','Acceleration','MPG'});

Display the first five rows of the table. tbl(1:5,:) ans=5×3 table Weight Acceleration ______ ____________ 3504 3693 3436 3433 3449

12 11.5 11 12 10.5

MPG ___ 18 15 18 16 17

Fit a linear regression model for miles per gallon (MPG). Specify the model formula by using Wilkinson notation. lm = fitlm(tbl,'MPG~Weight+Acceleration') lm = Linear regression model:

32-1842

fitlm

MPG ~ 1 + Weight + Acceleration Estimated Coefficients: Estimate __________ (Intercept) Weight Acceleration

45.155 -0.0082475 0.19694

SE __________

tStat _______

pValue __________

3.4659 0.00059836 0.14743

13.028 -13.783 1.3359

1.6266e-22 5.3165e-24 0.18493

Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 4.12 R-squared: 0.743, Adjusted R-Squared: 0.738 F-statistic vs. constant model: 132, p-value = 1.38e-27

The model 'MPG~Weight+Acceleration' in this example is equivalent to set the model specification as 'linear'. For example, lm2 = fitlm(tbl,'linear');

If you use a character vector for model specification and you do not specify the response variable, then fitlm accepts the last variable in tbl as the response variable and the other variables as the predictor variables.

Fit Linear Regression Using Specified Model Formula Fit a linear regression model using a model formula specified by Wilkinson notation. Load the sample data. load carsmall

Store the variables in a table.

tbl = table(Weight,Acceleration,Model_Year,MPG,'VariableNames',{'Weight','Acceleration','Model_Ye

Fit a linear regression model for miles per gallon (MPG) with weight and acceleration as the predictor variables. lm = fitlm(tbl,'MPG~Weight+Acceleration') lm = Linear regression model: MPG ~ 1 + Weight + Acceleration Estimated Coefficients: Estimate __________ (Intercept) Weight Acceleration

45.155 -0.0082475 0.19694

SE __________

tStat _______

pValue __________

3.4659 0.00059836 0.14743

13.028 -13.783 1.3359

1.6266e-22 5.3165e-24 0.18493

32-1843

32

Functions

Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 4.12 R-squared: 0.743, Adjusted R-Squared: 0.738 F-statistic vs. constant model: 132, p-value = 1.38e-27

The p-value of 0.18493 indicates that Acceleration does not have a significant impact on MPG. Remove Acceleration from the model, and try improving the model by adding the predictor variable Model_Year. First define Model_Year as a categorical variable. tbl.Model_Year = categorical(tbl.Model_Year); lm = fitlm(tbl,'MPG~Weight+Model_Year') lm = Linear regression model: MPG ~ 1 + Weight + Model_Year Estimated Coefficients: Estimate __________ (Intercept) Weight Model_Year_76 Model_Year_82

40.11 -0.0066475 1.9291 7.9093

SE __________

tStat _______

pValue __________

1.5418 0.00042802 0.74761 0.84975

26.016 -15.531 2.5804 9.3078

1.2024e-43 3.3639e-27 0.011488 7.8681e-15

Number of observations: 94, Error degrees of freedom: 90 Root Mean Squared Error: 2.92 R-squared: 0.873, Adjusted R-Squared: 0.868 F-statistic vs. constant model: 206, p-value = 3.83e-40

Specifying modelspec using Wilkinson notation enables you to update the model without having to change the design matrix. fitlm uses only the variables that are specified in the formula. It also creates the necessary two dummy indicator variables for the categorical variable Model_Year.

Fit Linear Regression Using Terms Matrix Fit a linear regression model using a terms matrix. Terms Matrix for Table Input If the model variables are in a table, then a column of 0s in a terms matrix represents the position of the response variable. Load the hospital data set. load hospital

Store the variables in a table. t = table(hospital.Sex,hospital.BloodPressure(:,1),hospital.Age,hospital.Smoker, ... 'VariableNames',{'Sex','BloodPressure','Age','Smoker'});

32-1844

fitlm

Represent the linear model 'BloodPressure ~ 1 + Sex + Age + Smoker' using a terms matrix. The response variable is in the second column of the table, so the second column of the terms matrix must be a column of 0s for the response variable. T = [0 0 0 0;1 0 0 0;0 0 1 0;0 0 0 1] T = 4×4 0 1 0 0

0 0 0 0

0 0 1 0

0 0 0 1

Fit a linear model. mdl1 = fitlm(t,T) mdl1 = Linear regression model: BloodPressure ~ 1 + Sex + Age + Smoker Estimated Coefficients: Estimate ________ (Intercept) Sex_Male Age Smoker_1

116.14 0.050106 0.085276 9.87

SE ________

tStat ________

pValue __________

2.6107 0.98364 0.066945 1.0346

44.485 0.050939 1.2738 9.5395

7.1287e-66 0.95948 0.2058 1.4516e-15

Number of observations: 100, Error degrees of freedom: 96 Root Mean Squared Error: 4.78 R-squared: 0.507, Adjusted R-Squared: 0.492 F-statistic vs. constant model: 33, p-value = 9.91e-15

Terms Matrix for Matrix Input If the predictor and response variables are in a matrix and column vector, then you must include 0 for the response variable at the end of each row in a terms matrix. Load the carsmall data set and define the matrix of predictors. load carsmall X = [Acceleration,Weight];

Specify the model 'MPG ~ Acceleration + Weight + Acceleration:Weight + Weight^2' using a terms matrix. This model includes the main effect and two-way interaction terms for the variables Acceleration and Weight, and a second-order term for the variable Weight. T = [0 0 0;1 0 0;0 1 0;1 1 0;0 2 0] T = 5×3 0 1 0

0 0 1

0 0 0

32-1845

32

Functions

1 0

1 2

0 0

Fit a linear model. mdl2 = fitlm(X,MPG,T) mdl2 = Linear regression model: y ~ 1 + x1*x2 + x2^2 Estimated Coefficients: Estimate ___________ (Intercept) x1 x2 x1:x2 x2^2

48.906 0.54418 -0.012781 -0.00010892 9.7518e-07

SE __________

tStat _______

pValue __________

12.589 0.57125 0.0060312 0.00017925 7.5389e-07

3.8847 0.95261 -2.1192 -0.6076 1.2935

0.00019665 0.34337 0.036857 0.545 0.19917

Number of observations: 94, Error degrees of freedom: 89 Root Mean Squared Error: 4.1 R-squared: 0.751, Adjusted R-Squared: 0.739 F-statistic vs. constant model: 67, p-value = 4.99e-26

Only the intercept and x2 term, which corresponds to the Weight variable, are significant at the 5% significance level.

Linear Regression with Categorical Predictor Fit a linear regression model that contains a categorical predictor. Reorder the categories of the categorical predictor to control the reference level in the model. Then, use anova to test the significance of the categorical variable. Model with Categorical Predictor Load the carsmall data set and create a linear regression model of MPG as a function of Model_Year. To treat the numeric vector Model_Year as a categorical variable, identify the predictor using the 'CategoricalVars' name-value pair argument. load carsmall mdl = fitlm(Model_Year,MPG,'CategoricalVars',1,'VarNames',{'Model_Year','MPG'}) mdl = Linear regression model: MPG ~ 1 + Model_Year Estimated Coefficients: Estimate ________ (Intercept) Model_Year_76

32-1846

17.69 3.8839

SE ______

tStat ______

pValue __________

1.0328 1.4059

17.127 2.7625

3.2371e-30 0.0069402

fitlm

Model_Year_82

14.02

1.4369

9.7571

8.2164e-16

Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 5.56 R-squared: 0.531, Adjusted R-Squared: 0.521 F-statistic vs. constant model: 51.6, p-value = 1.07e-15

The model formula in the display, MPG ~ 1 + Model_Year, corresponds to MPG = β0 + β1ΙYear = 76 + β2ΙYear = 82 + ϵ, where ΙYear = 76 and ΙYear = 82 are indicator variables whose value is one if the value of Model_Year is 76 and 82, respectively. The Model_Year variable includes three distinct values, which you can check by using the unique function. unique(Model_Year) ans = 3×1 70 76 82

fitlm chooses the smallest value in Model_Year as a reference level ('70') and creates two indicator variables ΙYear = 76 and ΙYear = 82. The model includes only two indicator variables because the design matrix becomes rank deficient if the model includes three indicator variables (one for each level) and an intercept term. Model with Full Indicator Variables You can interpret the model formula of mdl as a model that has three indicator variables without an intercept term: y = β0Ιx1 = 70 + β0 + β1 Ιx1 = 76 + β0 + β2 Ιx2 = 82 + ϵ. Alternatively, you can create a model that has three indicator variables without an intercept term by manually creating indicator variables and specifying the model formula. temp_Year = dummyvar(categorical(Model_Year)); Model_Year_70 = temp_Year(:,1); Model_Year_76 = temp_Year(:,2); Model_Year_82 = temp_Year(:,3); tbl = table(Model_Year_70,Model_Year_76,Model_Year_82,MPG); mdl = fitlm(tbl,'MPG ~ Model_Year_70 + Model_Year_76 + Model_Year_82 - 1') mdl = Linear regression model: MPG ~ Model_Year_70 + Model_Year_76 + Model_Year_82 Estimated Coefficients: Estimate ________ Model_Year_70 Model_Year_76

17.69 21.574

SE _______

tStat ______

pValue __________

1.0328 0.95387

17.127 22.617

3.2371e-30 4.0156e-39

32-1847

32

Functions

Model_Year_82

31.71

0.99896

31.743

5.2234e-51

Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 5.56

Choose Reference Level in Model You can choose a reference level by modifying the order of categories in a categorical variable. First, create a categorical variable Year. Year = categorical(Model_Year);

Check the order of categories by using the categories function. categories(Year) ans = 3x1 cell {'70'} {'76'} {'82'}

If you use Year as a predictor variable, then fitlm chooses the first category '70' as a reference level. Reorder Year by using the reordercats function. Year_reordered = reordercats(Year,{'76','70','82'}); categories(Year_reordered) ans = 3x1 cell {'76'} {'70'} {'82'}

The first category of Year_reordered is '76'. Create a linear regression model of MPG as a function of Year_reordered. mdl2 = fitlm(Year_reordered,MPG,'VarNames',{'Model_Year','MPG'}) mdl2 = Linear regression model: MPG ~ 1 + Model_Year Estimated Coefficients: Estimate ________ (Intercept) Model_Year_70 Model_Year_82

21.574 -3.8839 10.136

SE _______

tStat _______

pValue __________

0.95387 1.4059 1.3812

22.617 -2.7625 7.3385

4.0156e-39 0.0069402 8.7634e-11

Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 5.56 R-squared: 0.531, Adjusted R-Squared: 0.521 F-statistic vs. constant model: 51.6, p-value = 1.07e-15

mdl2 uses '76' as a reference level and includes two indicator variables ΙYear = 70 and ΙYear = 82. 32-1848

fitlm

Evaluate Categorical Predictor The model display of mdl2 includes a p-value of each term to test whether or not the corresponding coefficient is equal to zero. Each p-value examines each indicator variable. To examine the categorical variable Model_Year as a group of indicator variables, use anova. Use the 'components'(default) option to return a component ANOVA table that includes ANOVA statistics for each variable in the model except the constant term. anova(mdl2,'components') ans=2×5 table

Model_Year Error

SumSq ______

DF __

MeanSq ______

F _____

pValue __________

3190.1 2815.2

2 91

1595.1 30.936

51.56

1.0694e-15

The component ANOVA table includes the p-value of the Model_Year variable, which is smaller than the p-values of the indicator variables.

Specify Response and Predictor Variables for Linear Model Fit a linear regression model to sample data. Specify the response and predictor variables, and include only pairwise interaction terms in the model. Load sample data. load hospital

Fit a linear model with interaction terms to the data. Specify weight as the response variable, and sex, age, and smoking status as the predictor variables. Also, specify that sex and smoking status are categorical variables. mdl = fitlm(hospital,'interactions','ResponseVar','Weight',... 'PredictorVars',{'Sex','Age','Smoker'},... 'CategoricalVar',{'Sex','Smoker'}) mdl = Linear regression model: Weight ~ 1 + Sex*Age + Sex*Smoker + Age*Smoker Estimated Coefficients:

(Intercept) Sex_Male Age Smoker_1 Sex_Male:Age Sex_Male:Smoker_1 Age:Smoker_1

Estimate ________

SE _______

tStat ________

pValue __________

118.7 68.336 0.31068 3.0425 -0.49094 0.9509 -0.07288

7.0718 9.7153 0.18531 10.446 0.24764 3.8031 0.26275

16.785 7.0339 1.6765 0.29127 -1.9825 0.25003 -0.27737

6.821e-30 3.3386e-10 0.096991 0.77149 0.050377 0.80312 0.78211

Number of observations: 100, Error degrees of freedom: 93

32-1849

32

Functions

Root Mean Squared Error: 8.75 R-squared: 0.898, Adjusted R-Squared: 0.892 F-statistic vs. constant model: 137, p-value = 6.91e-44

The weight of the patients do not seem to differ significantly according to age, or the status of smoking, or interaction of these factors with patient sex at the 5% significance level.

Fit Robust Linear Regression Model Load the hald data set, which measures the effect of cement composition on its hardening heat. load hald

This data set includes the variables ingredients and heat. The matrix ingredients contains the percent composition of four chemicals present in the cement. The vector heat contains the values for the heat hardening after 180 days for each cement sample. Fit a robust linear regression model to the data. mdl = fitlm(ingredients,heat,'RobustOpts','on') mdl = Linear regression model (robust fit): y ~ 1 + x1 + x2 + x3 + x4 Estimated Coefficients: Estimate ________ (Intercept) x1 x2 x3 x4

60.09 1.5753 0.5322 0.13346 -0.12052

SE _______

tStat ________

pValue ________

75.818 0.80585 0.78315 0.8166 0.7672

0.79256 1.9548 0.67957 0.16343 -0.15709

0.4509 0.086346 0.51596 0.87424 0.87906

Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 2.65 R-squared: 0.979, Adjusted R-Squared: 0.969 F-statistic vs. constant model: 94.6, p-value = 9.03e-07

For more details, see the topic “Robust Regression — Reduce Outlier Effects” on page 11-103, which compares the results of a robust fit to a standard least-squares fit.

Input Arguments tbl — Input data table | dataset array Input data including predictor and response variables, specified as a table or dataset array. The predictor variables can be numeric, logical, categorical, character, or string. The response variable must be numeric or logical. 32-1850

fitlm

• By default, fitlm takes the last variable as the response variable and the others as the predictor variables. • To set a different column as the response variable, use the ResponseVar name-value pair argument. • To use a subset of the columns as predictors, use the PredictorVars name-value pair argument. • To define a model specification, set the modelspec argument using a formula or terms matrix. The formula or terms matrix specifies which columns to use as the predictor or response variables. The variable names in a table do not have to be valid MATLAB identifiers. However, if the names are not valid, you cannot use a formula when you fit or adjust a model; for example: • You cannot specify modelspec using a formula. • You cannot use a formula to specify the terms to add or remove when you use the addTerms function or the removeTerms function, respectively. • You cannot use a formula to specify the lower and upper bounds of the model when you use the step or stepwiselm function with the name-value pair arguments 'Lower' and 'Upper', respectively. You can verify the variable names in tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,tbl.Properties.VariableNames)

If the variable names in tbl are not valid, then convert them by using the matlab.lang.makeValidName function. tbl.Properties.VariableNames = matlab.lang.makeValidName(tbl.Properties.VariableNames);

X — Predictor variables matrix Predictor variables, specified as an n-by-p matrix, where n is the number of observations and p is the number of predictor variables. Each column of X represents one variable, and each row represents one observation. By default, there is a constant term in the model, unless you explicitly remove it, so do not include a column of 1s in X. Data Types: single | double y — Response variable vector Response variable, specified as an n-by-1 vector, where n is the number of observations. Each entry in y is the response for the corresponding row of X. Data Types: single | double | logical modelspec — Model specification 'linear' (default) | character vector or string scalar naming the model | t-by-(p + 1) terms matrix | character vector or string scalar formula in the form 'Y ~ terms' Model specification, specified as one of these values. 32-1851

32

Functions

• A character vector or string scalar naming the model. Value

Model Type

'constant'

Model contains only a constant (intercept) term.

'linear'

Model contains an intercept and linear term for each predictor.

'interactions'

Model contains an intercept, linear term for each predictor, and all products of pairs of distinct predictors (no squared terms).

'purequadratic'

Model contains an intercept term and linear and squared terms for each predictor.

'quadratic'

Model contains an intercept term, linear and squared terms for each predictor, and all products of pairs of distinct predictors.

'polyijk'

Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor, and so on. Specify the maximum degree for each predictor by using numerals 0 though 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example, 'poly13' has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second predictors, respectively.

• A t-by-(p + 1) matrix, or a “Terms Matrix” on page 32-1856, specifying terms in the model, where t is the number of terms and p is the number of predictor variables, and +1 accounts for the response variable. A terms matrix is convenient when the number of predictors is large and you want to generate the terms programmatically. • A character vector or string scalar representing a “Formula” on page 32-1857 in the form 'Y ~ terms', where the terms are in “Wilkinson Notation” on page 32-1857. The variable names in the formula must be valid MATLAB identifiers. The software determines the order of terms in a fitted model by using the order of terms in tbl or X. Therefore, the order of terms in the model can be different from the order of terms in the specified formula. Example: 'quadratic' Example: 'y ~ X1 + X2^2 + X1:X2' Data Types: single | double | char | string Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Intercept',false,'PredictorVars', [1,3],'ResponseVar',5,'RobustOpts','logistic' specifies a robust regression model with no constant term, where the algorithm uses the logistic weighting function with the default tuning 32-1852

fitlm

constant, first and third variables are the predictor variables, and fifth variable is the response variable. CategoricalVars — Categorical variable list string array | cell array of character vectors | logical or numeric index vector Categorical variable list, specified as the comma-separated pair consisting of 'CategoricalVars' and either a string array or cell array of character vectors containing categorical variable names in the table or dataset array tbl, or a logical or numeric index vector indicating which columns are categorical. • If data is in a table or dataset array tbl, then, by default, fitlm treats all categorical values, logical values, character arrays, string arrays, and cell arrays of character vectors as categorical variables. • If data is in matrix X, then the default value of 'CategoricalVars' is an empty matrix []. That is, no variable is categorical unless you specify it as categorical. For example, you can specify the observations 2 and 3 out of 6 as categorical using either of the following examples. Example: 'CategoricalVars',[2,3] Example: 'CategoricalVars',logical([0 1 1 0 0 0]) Data Types: single | double | logical | string | cell Exclude — Observations to exclude logical or numeric index vector Observations to exclude from the fit, specified as the comma-separated pair consisting of 'Exclude' and a logical or numeric index vector indicating which observations to exclude from the fit. For example, you can exclude observations 2 and 3 out of 6 using either of the following examples. Example: 'Exclude',[2,3] Example: 'Exclude',logical([0 1 1 0 0 0]) Data Types: single | double | logical Intercept — Indicator for constant term true (default) | false Indicator for the constant term (intercept) in the fit, specified as the comma-separated pair consisting of 'Intercept' and either true to include or false to remove the constant term from the model. Use 'Intercept' only when specifying the model using a character vector or string scalar, not a formula or matrix. Example: 'Intercept',false PredictorVars — Predictor variables string array | cell array of character vectors | logical or numeric index vector Predictor variables to use in the fit, specified as the comma-separated pair consisting of 'PredictorVars' and either a string array or cell array of character vectors of the variable names in the table or dataset array tbl, or a logical or numeric index vector indicating which columns are predictor variables. 32-1853

32

Functions

The string values or character vectors should be among the names in tbl, or the names you specify using the 'VarNames' name-value pair argument. The default is all variables in X, or all variables in tbl except for ResponseVar. For example, you can specify the second and third variables as the predictor variables using either of the following examples. Example: 'PredictorVars',[2,3] Example: 'PredictorVars',logical([0 1 1 0 0 0]) Data Types: single | double | logical | string | cell ResponseVar — Response variable last column in tbl (default) | character vector or string scalar containing variable name | logical or numeric index vector Response variable to use in the fit, specified as the comma-separated pair consisting of 'ResponseVar' and either a character vector or string scalar containing the variable name in the table or dataset array tbl, or a logical or numeric index vector indicating which column is the response variable. You typically need to use 'ResponseVar' when fitting a table or dataset array tbl. For example, you can specify the fourth variable, say yield, as the response out of six variables, in one of the following ways. Example: 'ResponseVar','yield' Example: 'ResponseVar',[4] Example: 'ResponseVar',logical([0 0 0 1 0 0]) Data Types: single | double | logical | char | string RobustOpts — Indicator of robust fitting type 'off' (default) | 'on' | character vector | string scalar | structure Indicator of the robust fitting type to use, specified as the comma-separated pair consisting of 'RobustOpts' and one of these values. • 'off' — No robust fitting. fitlm uses ordinary least squares. • 'on' — Robust fitting using the 'bisquare' weight function with the default tuning constant. • Character vector or string scalar — Name of a robust fitting weight function from the following table. fitlm uses the corresponding default tuning constant specified in the table. • Structure with the two fields RobustWgtFun and Tune. • The RobustWgtFun field contains the name of a robust fitting weight function from the following table or a function handle of a custom weight function. • The Tune field contains a tuning constant. If you do not set the Tune field, fitlm uses the corresponding default tuning constant.

32-1854

Weight Function

Description

Default Tuning Constant

'andrews'

w = (abs(r) ut, ∗ βj

= 0

if β

j

≤ ut,

β j + ut if β j < − ut . • For SGD, β is the estimate of coefficient j after processing k mini-batches. u = kγ λ . γ is the t j t t learning rate at iteration t. λ is the value of Lambda. • For ASGD, β is the averaged estimate coefficient j after processing k mini-batches, u = kλ . j t 32-1961

32

Functions

If Regularization is 'ridge', then the software ignores TruncationPeriod. Example: 'TruncationPeriod',100 Data Types: single | double Other Regression Options

Weights — Observation weights ones(n,1)/n (default) | numeric vector of positive values Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values. fitrlinear weighs the observations in X with the corresponding value in Weights. The size of Weights must equal n, the number of observations in X. fitrlinear normalizes Weights to sum to 1. Data Types: double | single ResponseName — Response variable name 'Y' (default) | character vector | string scalar Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector or string scalar. • If you supply Y, then you can use 'ResponseName' to specify a name for the response variable. • If you supply ResponseVarName or formula, then you cannot use 'ResponseName'. Example: 'ResponseName','response' Data Types: char | string ResponseTransform — Response transformation 'none' (default) | function handle Response transformation, specified as the comma-separated pair consisting of 'ResponseTransform' and either 'none' or a function handle. The default is 'none', which means @(y)y, or no transformation. For a MATLAB function or a function you define, use its function handle. The function handle must accept a vector (the original response values) and return a vector of the same size (the transformed response values). Example: Suppose you create a function handle that applies an exponential transformation to an input vector by using myfunction = @(y)exp(y). Then, you can specify the response transformation as 'ResponseTransform',myfunction. Data Types: char | string | function_handle Cross-Validation Options

CrossVal — Cross-validation flag 'off' (default) | 'on' Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software implements 10-fold cross-validation. 32-1962

fitrlinear

To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, or KFold. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only. Example: 'Crossval','on' CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object as created by cvpartition. The partition object specifies the type of cross-validation, and also the indexing for training and validation sets. To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', or 'KFold'. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software: 1

Randomly reserves p*100% of the data as validation data, and trains the model using the rest of the data

2

Stores the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', or 'KFold'. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated classifier, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify, e.g., 'KFold',k, then the software: 1

Randomly partitions the data into k sets

2

For each set, reserves the set as validation data, and trains the model using the other k – 1 sets

3

Stores the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four options only: 'CVPartition', 'Holdout', or 'KFold'. Example: 'KFold',8 Data Types: single | double SGD and ASGD Convergence Controls

BatchLimit — Maximal number of batches positive integer 32-1963

32

Functions

Maximal number of batches to process, specified as the comma-separated pair consisting of 'BatchLimit' and a positive integer. When the software processes BatchLimit batches, it terminates optimization. • By default: • The software passes through the data PassLimit times. • If you specify multiple solvers, and use (A)SGD to get an initial approximation for the next solver, then the default value is ceil(1e6/BatchSize). BatchSize is the value of the 'BatchSize' name-value pair argument. • If you specify 'BatchLimit' and 'PassLimit', then the software chooses the argument that results in processing the fewest observations. • If you specify 'BatchLimit' but not 'PassLimit', then the software processes enough batches to complete up to one entire pass through the data. Example: 'BatchLimit',100 Data Types: single | double BetaTolerance — Relative tolerance on linear coefficients and bias term 1e-4 (default) | nonnegative scalar Relative tolerance on the linear coefficients and the bias term (intercept), specified as the commaseparated pair consisting of 'BetaTolerance' and a nonnegative scalar. Let Bt = βt′ bt , that is, the vector of the coefficients and the bias term at optimization iteration t. If Bt − Bt − 1 < BetaTolerance, then optimization terminates. Bt 2 If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'BetaTolerance',1e-6 Data Types: single | double NumCheckConvergence — Number of batches to process before next convergence check positive integer Number of batches to process before next convergence check, specified as the comma-separated pair consisting of 'NumCheckConvergence' and a positive integer. To specify the batch size, see BatchSize. The software checks for convergence about 10 times per pass through the entire data set by default. Example: 'NumCheckConvergence',100 Data Types: single | double PassLimit — Maximal number of passes 1 (default) | positive integer Maximal number of passes through the data, specified as the comma-separated pair consisting of 'PassLimit' and a positive integer. 32-1964

fitrlinear

fitrlinear processes all observations when it completes one pass through the data. When fitrlinear passes through the data PassLimit times, it terminates optimization. If you specify 'BatchLimit' and PassLimit, then fitrlinear chooses the argument that results in processing the fewest observations. For more details, see “Algorithms” on page 32-1976. Example: 'PassLimit',5 Data Types: single | double ValidationData — Validation data for optimization convergence detection cell array Data for optimization convergence detection, specified as the comma-separated pair consisting of 'ValidationData' and a cell array. During optimization, the software periodically estimates the loss of ValidationData. If the validation-data loss increases, then the software terminates optimization. For more details, see “Algorithms” on page 32-1976. To optimize hyperparameters using cross-validation, see crossvalidation options such as CrossVal. • ValidationData(1) must contain an m-by-p or p-by-m full or sparse matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. The number of observations in both sets can vary. • ValidationData(2) must contain an array of m responses with length corresponding to the number of observations in ValidationData{1}. • Optionally, ValidationData(3) can contain an m-dimensional numeric vector of observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData, then, to display validation loss at the command line, specify a value larger than 0 for Verbose. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. By default, the software does not detect convergence by monitoring validation-data loss. GradientTolerance — Absolute gradient tolerance 1e-6 (default) | nonnegative scalar Absolute gradient tolerance, specified as the comma-separated pair consisting of 'GradientTolerance' and a nonnegative scalar. GradientTolerance applies to these values of Solver: 'bfgs', 'lbfgs', and 'sparsa'. Let ∇ℒ t be the gradient vector of the objective function with respect to the coefficients and bias term at optimization iteration t. If ∇ℒ t ∞ = max ∇ℒ t < GradientTolerance, then optimization terminates. If you also specify BetaTolerance, then optimization terminates when fitrlinear satisfies either stopping criterion. If fitrlinear converges for the last solver specified in Solver, then optimization terminates. Otherwise, fitrlinear uses the next solver specified in Solver. Example: 'GradientTolerance',eps 32-1965

32

Functions

Data Types: single | double IterationLimit — Maximal number of optimization iterations 1000 (default) | positive integer Maximal number of optimization iterations, specified as the comma-separated pair consisting of 'IterationLimit' and a positive integer. IterationLimit applies to these values of Solver: 'bfgs', 'lbfgs', and 'sparsa'. Example: 'IterationLimit',1e7 Data Types: single | double Dual SGD Optimization Convergence Controls

BetaTolerance — Relative tolerance on linear coefficients and bias term 1e-4 (default) | nonnegative scalar Relative tolerance on the linear coefficients and the bias term (intercept), specified as the commaseparated pair consisting of 'BetaTolerance' and a nonnegative scalar. Let Bt = βt′ bt , that is, the vector of the coefficients and the bias term at optimization iteration t. If Bt − Bt − 1 < BetaTolerance, then optimization terminates. Bt 2 If you also specify DeltaGradientTolerance, then optimization terminates when the software satisfies either stopping criterion. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'BetaTolerance',1e-6 Data Types: single | double DeltaGradientTolerance — Gradient-difference tolerance 0.1 (default) | nonnegative scalar Gradient-difference tolerance between upper and lower pool Karush-Kuhn-Tucker (KKT) complementarity conditions on page 32-1690 violators, specified as the comma-separated pair consisting of 'DeltaGradientTolerance' and a nonnegative scalar. DeltaGradientTolerance applies to the 'dual' value of Solver only. • If the magnitude of the KKT violators is less than DeltaGradientTolerance, then fitrlinear terminates optimization. • If fitrlinear converges for the last solver specified in Solver, then optimization terminates. Otherwise, fitrlinear uses the next solver specified in Solver. Example: 'DeltaGapTolerance',1e-2 Data Types: double | single NumCheckConvergence — Number of passes through entire data set to process before next convergence check 5 (default) | positive integer 32-1966

fitrlinear

Number of passes through entire data set to process before next convergence check, specified as the comma-separated pair consisting of 'NumCheckConvergence' and a positive integer. Example: 'NumCheckConvergence',100 Data Types: single | double PassLimit — Maximal number of passes 10 (default) | positive integer Maximal number of passes through the data, specified as the comma-separated pair consisting of 'PassLimit' and a positive integer. When the software completes one pass through the data, it has processed all observations. When the software passes through the data PassLimit times, it terminates optimization. Example: 'PassLimit',5 Data Types: single | double ValidationData — Validation data for optimization convergence detection cell array Data for optimization convergence detection, specified as the comma-separated pair consisting of 'ValidationData' and a cell array. During optimization, the software periodically estimates the loss of ValidationData. If the validation-data loss increases, then the software terminates optimization. For more details, see “Algorithms” on page 32-1976. To optimize hyperparameters using cross-validation, see crossvalidation options such as CrossVal. • ValidationData(1) must contain an m-by-p or p-by-m full or sparse matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. The number of observations in both sets can vary. • ValidationData(2) must contain an array of m responses with length corresponding to the number of observations in ValidationData{1}. • Optionally, ValidationData(3) can contain an m-dimensional numeric vector of observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData, then, to display validation loss at the command line, specify a value larger than 0 for Verbose. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. By default, the software does not detect convergence by monitoring validation-data loss. BFGS, LBFGS, and SpaRSA Convergence Controls

BetaTolerance — Relative tolerance on linear coefficients and bias term 1e-4 (default) | nonnegative scalar Relative tolerance on the linear coefficients and the bias term (intercept), specified as the commaseparated pair consisting of 'BetaTolerance' and a nonnegative scalar. 32-1967

32

Functions

Let Bt = βt′ bt , that is, the vector of the coefficients and the bias term at optimization iteration t. If Bt − Bt − 1 < BetaTolerance, then optimization terminates. Bt 2 If you also specify GradientTolerance, then optimization terminates when the software satisfies either stopping criterion. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'BetaTolerance',1e-6 Data Types: single | double GradientTolerance — Absolute gradient tolerance 1e-6 (default) | nonnegative scalar Absolute gradient tolerance, specified as the comma-separated pair consisting of 'GradientTolerance' and a nonnegative scalar. Let ∇ℒ t be the gradient vector of the objective function with respect to the coefficients and bias term at optimization iteration t. If ∇ℒ t ∞ = max ∇ℒ t < GradientTolerance, then optimization terminates. If you also specify BetaTolerance, then optimization terminates when the software satisfies either stopping criterion. If the software converges for the last solver specified in the software, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. Example: 'GradientTolerance',1e-5 Data Types: single | double HessianHistorySize — Size of history buffer for Hessian approximation 15 (default) | positive integer Size of history buffer for Hessian approximation, specified as the comma-separated pair consisting of 'HessianHistorySize' and a positive integer. That is, at each iteration, the software composes the Hessian using statistics from the latest HessianHistorySize iterations. The software does not support 'HessianHistorySize' for SpaRSA. Example: 'HessianHistorySize',10 Data Types: single | double IterationLimit — Maximal number of optimization iterations 1000 (default) | positive integer Maximal number of optimization iterations, specified as the comma-separated pair consisting of 'IterationLimit' and a positive integer. IterationLimit applies to these values of Solver: 'bfgs', 'lbfgs', and 'sparsa'. Example: 'IterationLimit',500 Data Types: single | double 32-1968

fitrlinear

ValidationData — Validation data for optimization convergence detection cell array Data for optimization convergence detection, specified as the comma-separated pair consisting of 'ValidationData' and a cell array. During optimization, the software periodically estimates the loss of ValidationData. If the validation-data loss increases, then the software terminates optimization. For more details, see “Algorithms” on page 32-1976. To optimize hyperparameters using cross-validation, see crossvalidation options such as CrossVal. • ValidationData(1) must contain an m-by-p or p-by-m full or sparse matrix of predictor data that has the same orientation as X. The predictor variables in the training data X and ValidationData{1} must correspond. The number of observations in both sets can vary. • ValidationData(2) must contain an array of m responses with length corresponding to the number of observations in ValidationData{1}. • Optionally, ValidationData(3) can contain an m-dimensional numeric vector of observation weights. The software normalizes the weights with the validation data so that they sum to 1. If you specify ValidationData, then, to display validation loss at the command line, specify a value larger than 0 for Verbose. If the software converges for the last solver specified in Solver, then optimization terminates. Otherwise, the software uses the next solver specified in Solver. By default, the software does not detect convergence by monitoring validation-data loss. Hyperparameter Optimization

OptimizeHyperparameters — Parameters to optimize 'none' (default) | 'auto' | 'all' | string array or cell array of eligible parameter names | vector of optimizableVariable objects Parameters to optimize, specified as the comma-separated pair consisting of 'OptimizeHyperparameters' and one of the following: • 'none' — Do not optimize. • 'auto' — Use {'Lambda','Learner'}. • 'all' — Optimize all eligible parameters. • String array or cell array of eligible parameter names. • Vector of optimizableVariable objects, typically the output of hyperparameters. The optimization attempts to minimize the cross-validation loss (error) for fitrlinear by varying the parameters. To control the cross-validation type and other aspects of the optimization, use the HyperparameterOptimizationOptions name-value pair. Note 'OptimizeHyperparameters' values override any values you set using other name-value pair arguments. For example, setting 'OptimizeHyperparameters' to 'auto' causes the 'auto' values to apply. The eligible parameters for fitrlinear are: 32-1969

32

Functions

• Lambda — fitrlinear searches among positive values, by default log-scaled in the range [1e-5/NumObservations,1e5/NumObservations]. • Learner — fitrlinear searches among 'svm' and 'leastsquares'. • Regularization — fitrlinear searches among 'ridge' and 'lasso'. Set nondefault parameters by passing a vector of optimizableVariable objects that have nondefault values. For example, load carsmall params = hyperparameters('fitrlinear',[Horsepower,Weight],MPG); params(1).Range = [1e-3,2e4];

Pass params as the value of OptimizeHyperparameters. By default, iterative display appears at the command line, and plots appear according to the number of hyperparameters in the optimization. For the optimization and plots, the objective function is log(1 + cross-validation loss) for regression and the misclassification rate for classification. To control the iterative display, set the Verbose field of the 'HyperparameterOptimizationOptions' name-value pair argument. To control the plots, set the ShowPlots field of the 'HyperparameterOptimizationOptions' name-value pair argument. For an example, see “Optimize a Linear Regression” on page 32-1949. Example: 'OptimizeHyperparameters','auto' HyperparameterOptimizationOptions — Options for optimization structure Options for optimization, specified as the comma-separated pair consisting of 'HyperparameterOptimizationOptions' and a structure. This argument modifies the effect of the OptimizeHyperparameters name-value pair argument. All fields in the structure are optional. Field Name

Values

Default

Optimizer

• 'bayesopt' — Use Bayesian optimization. Internally, this setting calls bayesopt.

'bayesopt'

• 'gridsearch' — Use grid search with NumGridDivisions values per dimension. • 'randomsearch' — Search at random among MaxObjectiveEvaluations points. 'gridsearch' searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command sortrows(Mdl.HyperparameterOptimizatio nResults).

32-1970

fitrlinear

Field Name

Values

AcquisitionFunct • 'expected-improvement-per-secondionName plus' • 'expected-improvement'

Default 'expectedimprovement-persecond-plus'

• 'expected-improvement-plus' • 'expected-improvement-per-second' • 'lower-confidence-bound' • 'probability-of-improvement' Acquisition functions whose names include persecond do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include plus modify their behavior when they are overexploiting an area. For more details, see “Acquisition Function Types” on page 10-3. MaxObjectiveEval Maximum number of objective function uations evaluations.

30 for 'bayesopt' or 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Inf

Time limit, specified as a positive real. The time limit is in seconds, as measured by tic and toc. Run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

NumGridDivisions For 'gridsearch', the number of values in each 10 dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables. ShowPlots

Logical value indicating whether to show plots. If true true, this field plots the best objective function value against the iteration number. If there are one or two optimization parameters, and if Optimizer is 'bayesopt', then ShowPlots also plots a model of the objective function against the parameters.

SaveIntermediate Logical value indicating whether to save results Results when Optimizer is 'bayesopt'. If true, this field overwrites a workspace variable named 'BayesoptResults' at each iteration. The variable is a BayesianOptimization object.

false

32-1971

32

Functions

Field Name

Values

Default

Verbose

Display to the command line.

1

• 0 — No iterative display • 1 — Iterative display • 2 — Iterative display with extra information For details, see the bayesopt Verbose namevalue pair argument. UseParallel

Logical value indicating whether to run Bayesian false optimization in parallel, which requires Parallel Computing Toolbox. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. For details, see “Parallel Bayesian Optimization” on page 10-7.

Repartition

Logical value indicating whether to repartition the false cross-validation at every iteration. If false, the optimizer uses a single partition for the optimization. true usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

Use no more than one of the following three field names. CVPartition

A cvpartition object, as created by cvpartition.

Holdout

A scalar in the range (0,1) representing the holdout fraction.

Kfold

An integer greater than 1.

'Kfold',5 if you do not specify any crossvalidation field

Example: 'HyperparameterOptimizationOptions',struct('MaxObjectiveEvaluations',60) Data Types: struct

Output Arguments Mdl — Trained linear regression model RegressionLinear model object | RegressionPartitionedLinear cross-validated model object Trained linear regression model, returned as a RegressionLinear model object or RegressionPartitionedLinear cross-validated model object. If you set any of the name-value pair arguments KFold, Holdout, CrossVal, or CVPartition, then Mdl is a RegressionPartitionedLinear cross-validated model object. Otherwise, Mdl is a RegressionLinear model object.

32-1972

fitrlinear

To reference properties of Mdl, use dot notation. For example, enter Mdl.Beta in the Command Window to display the vector or matrix of estimated coefficients. Note Unlike other regression models, and for economical memory usage, RegressionLinear and RegressionPartitionedLinear model objects do not store the training data or optimization details (for example, convergence history). FitInfo — Optimization details structure array Optimization details, returned as a structure array. Fields specify final values or name-value pair argument specifications, for example, Objective is the value of the objective function when optimization terminates. Rows of multidimensional fields correspond to values of Lambda and columns correspond to values of Solver. This table describes some notable fields. Field

Description

TerminationStatus

• Reason for optimization termination • Corresponds to a value in TerminationCode

FitTime

Elapsed, wall-clock time in seconds

History

A structure array of optimization information for each iteration. The field Solver stores solver types using integer coding. Integer

Solver

1

SGD

2

ASGD

3

Dual SGD for SVM

4

LBFGS

5

BFGS

6

SpaRSA

To access fields, use dot notation. For example, to access the vector of objective function values for each iteration, enter FitInfo.History.Objective. It is good practice to examine FitInfo to assess whether convergence is satisfactory. HyperparameterOptimizationResults — Cross-validation optimization of hyperparameters BayesianOptimization object | table of hyperparameters and associated values Cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. The output is nonempty when the value of 'OptimizeHyperparameters' is not 'none'. The output value depends on the Optimizer field value of the 'HyperparameterOptimizationOptions' name-value pair argument:

32-1973

32

Functions

Value of Optimizer Field

Value of HyperparameterOptimizationResults

'bayesopt' (default)

Object of class BayesianOptimization

'gridsearch' or 'randomsearch'

Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Note If Learner is 'leastsquares', then the loss term in the objective function is half of the MSE. loss returns the MSE by default. Therefore, if you use loss to check the resubstitution, or training, error then there is a discrepancy between the MSE returned by loss and optimization results in FitInfo or returned to the command line by setting a positive verbosity level using Verbose.

More About Warm Start A warm start is initial estimates of the beta coefficients and bias term supplied to an optimization routine for quicker convergence. Alternatives for Lower-Dimensional Data High-dimensional linear classification and regression models minimize objective functions relatively quickly, but at the cost of some accuracy, the numeric-only predictor variables restriction, and the model must be linear with respect to the parameters. If your predictor data set is low- through medium-dimensional, or contains heterogeneous variables, then you should use the appropriate classification or regression fitting function. To help you decide which fitting function is appropriate for your low-dimensional data set, use this table. Model to Fit

Function

Notable Algorithmic Differences

SVM

• Binary classification: fitcsvm

• Computes the Gram matrix of the predictor variables, which is convenient for nonlinear kernel transformations.

• Multiclass classification: fitcecoc

Linear regression

• Regression: fitrsvm

• Solves dual problem using SMO, ISDA, or L1 minimization via quadratic programming using quadprog.

• Least-squares without regularization: fitlm

• lasso implements cyclic coordinate descent.

• Regularized least-squares using a lasso penalty: lasso • Ridge regression: ridge or lasso

32-1974

fitrlinear

Model to Fit

Function

Notable Algorithmic Differences

Logistic regression

• Logistic regression without regularization: fitglm.

• fitglm implements iteratively reweighted least squares.

• Regularized logistic regression using a lasso penalty: lassoglm

• lassoglm implements cyclic coordinate descent.

Tips • It is a best practice to orient your predictor matrix so that observations correspond to columns and to specify 'ObservationsIn','columns'. As a result, you can experience a significant reduction in optimization-execution time. • For better optimization accuracy if X is high-dimensional and Regularization is 'ridge', set any of these combinations for Solver: • 'sgd' • 'asgd' • 'dual' if Learner is 'svm' • {'sgd','lbfgs'} • {'asgd','lbfgs'} • {'dual','lbfgs'} if Learner is 'svm' Other combinations can result in poor optimization accuracy. • For better optimization accuracy if X is moderate- through low-dimensional and Regularization is 'ridge', set Solver to 'bfgs'. • If Regularization is 'lasso', set any of these combinations for Solver: • 'sgd' • 'asgd' • 'sparsa' • {'sgd','sparsa'} • {'asgd','sparsa'} • When choosing between SGD and ASGD, consider that: • SGD takes less time per iteration, but requires more iterations to converge. • ASGD requires fewer iterations to converge, but takes more time per iteration. • If X has few observations, but many predictor variables, then: • Specify 'PostFitBias',true. • For SGD or ASGD solvers, set PassLimit to a positive integer that is greater than 1, for example, 5 or 10. This setting often results in better accuracy. • For SGD and ASGD solvers, BatchSize affects the rate of convergence. • If BatchSize is too small, then fitrlinear achieves the minimum in many iterations, but computes the gradient per iteration quickly. 32-1975

32

Functions

• If BatchSize is too large, then fitrlinear achieves the minimum in fewer iterations, but computes the gradient per iteration slowly. • Large learning rates (see LearnRate) speed up convergence to the minimum, but can lead to divergence (that is, over-stepping the minimum). Small learning rates ensure convergence to the minimum, but can lead to slow termination. • When using lasso penalties, experiment with various values of TruncationPeriod. For example, set TruncationPeriod to 1, 10, and then 100. • For efficiency, fitrlinear does not standardize predictor data. To standardize X, enter X = bsxfun(@rdivide,bsxfun(@minus,X,mean(X,2)),std(X,0,2));

The code requires that you orient the predictors and observations as the rows and columns of X, respectively. Also, for memory-usage economy, the code replaces the original predictor data the standardized data. • After training a model, you can generate C/C++ code that predicts responses for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 31-2.

Algorithms • If you specify ValidationData, then, during objective-function optimization: • fitrlinear estimates the validation loss of ValidationData periodically using the current model, and tracks the minimal estimate. • When fitrlinear estimates a validation loss, it compares the estimate to the minimal estimate. • When subsequent, validation loss estimates exceed the minimal estimate five times, fitrlinear terminates optimization. • If you specify ValidationData and to implement a cross-validation routine (CrossVal, CVPartition, Holdout, or KFold), then: 1

fitrlinear randomly partitions X and Y according to the cross-validation routine that you choose.

2

fitrlinear trains the model using the training-data partition. During objective-function optimization, fitrlinear uses ValidationData as another possible way to terminate optimization (for details, see the previous bullet).

3

Once fitrlinear satisfies a stopping criterion, it constructs a trained model based on the optimized linear coefficients and intercept.

4

a

If you implement k-fold cross-validation, and fitrlinear has not exhausted all trainingset folds, then fitrlinear returns to Step 2 to train using the next training-set fold.

b

Otherwise, fitrlinear terminates training, and then returns the cross-validated model.

You can determine the quality of the cross-validated model. For example: • To determine the validation loss using the holdout or out-of-fold data from step 1, pass the cross-validated model to kfoldLoss. • To predict observations on the holdout or out-of-fold data from step 1, pass the crossvalidated model to kfoldPredict.

32-1976

fitrlinear

References [1] Ho, C. H. and C. J. Lin. “Large-Scale Linear Support Vector Regression.” Journal of Machine Learning Research, Vol. 13, 2012, pp. 3323–3348. [2] Hsieh, C. J., K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. “A Dual Coordinate Descent Method for Large-Scale Linear SVM.” Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 2001, pp. 408–415. [3] Langford, J., L. Li, and T. Zhang. “Sparse Online Learning Via Truncated Gradient.” J. Mach. Learn. Res., Vol. 10, 2009, pp. 777–801. [4] Nocedal, J. and S. J. Wright. Numerical Optimization, 2nd ed., New York: Springer, 2006. [5] Shalev-Shwartz, S., Y. Singer, and N. Srebro. “Pegasos: Primal Estimated Sub-Gradient Solver for SVM.” Proceedings of the 24th International Conference on Machine Learning, ICML ’07, 2007, pp. 807–814. [6] Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo. “Sparse Reconstruction by Separable Approximation.” Trans. Sig. Proc., Vol. 57, No 7, 2009, pp. 2479–2493. [7] Xiao, Lin. “Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization.” J. Mach. Learn. Res., Vol. 11, 2010, pp. 2543–2596. [8] Xu, Wei. “Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent.” CoRR, abs/1107.2490, 2011.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. Usage notes and limitations: • Some name-value pair arguments have different defaults and values compared to the in-memory fitrlinear function. Supported name-value pair arguments, and any differences, are: • 'Epsilon' • 'ObservationsIn' — Supports only 'rows'. • 'Lambda' — Can be 'auto' (default) or a scalar. • 'Learner' • 'Regularization' — Supports only 'ridge'. • 'Solver' — Supports only 'lbfgs'. • 'Verbose' — Default value is 1 • 'Beta' • 'Bias' • 'FitBias' — Supports only true. • 'Weights' — Value must be a tall array. • 'HessianHistorySize' 32-1977

32

Functions

• 'BetaTolerance' — Default value is relaxed to 1e-3. • 'GradientTolerance' — Default value is relaxed to 1e-3. • 'IterationLimit' — Default value is relaxed to 20. • 'OptimizeHyperparameters' — Value of 'Regularization' parameter must be 'ridge'. • 'HyperparameterOptimizationOptions' — For cross-validation, tall optimization supports only 'Holdout' validation. For example, you can specify fitrlinear(X,Y,'OptimizeHyperparameters','auto','HyperparameterOptimizat ionOptions',struct('Holdout',0.2)). • For tall arrays fitrlinear implements LBFGS by distributing the calculation of the loss and the gradient among different parts of the tall array at each iteration. Other solvers are not available for tall arrays. When initial values for Beta and Bias are not given, fitrlinear first refines the initial estimates of the parameters by fitting the model locally to parts of the data and combining the coefficients by averaging. For more information, see “Tall Arrays” (MATLAB). Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. To perform parallel hyperparameter optimization, use the 'HyperparameterOptions', struct('UseParallel',true) name-value pair argument in the call to this function. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also RegressionLinear | RegressionPartitionedLinear | fitclinear | fitlm | fitrsvm | kfoldLoss | kfoldPredict | lasso | predict | ridge Introduced in R2016a

32-1978

fitrm

fitrm Fit repeated measures model

Syntax rm = fitrm(t,modelspec) rm = fitrm(t,modelspec,Name,Value)

Description rm = fitrm(t,modelspec) returns a repeated measures model, specified by modelspec, fitted to the variables in the table or dataset array t. rm = fitrm(t,modelspec,Name,Value) returns a repeated measures model, with additional options specified by one or more Name,Value pair arguments. For example, you can specify the hypothesis for the within-subject factors.

Examples Fit a Repeated Measures Model Load the sample data. load fisheriris

The column vector species consists of iris flowers of three different species: setosa, versicolor, and virginica. The double matrix meas consists of four types of measurements on the flowers: the length and width of sepals and petals in centimeters, respectively. Store the data in a table array. t = table(species,meas(:,1),meas(:,2),meas(:,3),meas(:,4),... 'VariableNames',{'species','meas1','meas2','meas3','meas4'}); Meas = table([1 2 3 4]','VariableNames',{'Measurements'});

Fit a repeated measures model, where the measurements are the responses and the species is the predictor variable. rm = fitrm(t,'meas1-meas4~species','WithinDesign',Meas) rm = RepeatedMeasuresModel with properties: Between Subjects: BetweenDesign: ResponseNames: BetweenFactorNames: BetweenModel:

[150x5 table] {'meas1' 'meas2' {'species'} '1 + species'

'meas3'

'meas4'}

Within Subjects:

32-1979

32

Functions

WithinDesign: [4x1 table] WithinFactorNames: {'Measurements'} WithinModel: 'separatemeans' Estimates: Coefficients: [3x4 table] Covariance: [4x4 table]

Display the coefficients. rm.Coefficients ans=3×4 table

(Intercept) species_setosa species_versicolor

meas1 ________

meas2 ________

meas3 ______

meas4 ________

5.8433 -0.83733 0.092667

3.0573 0.37067 -0.28733

3.758 -2.296 0.502

1.1993 -0.95333 0.12667

fitrm uses the 'effects' contrasts which means that the coefficients sum to 0. The rm.DesignMatrix has one column of 1s for the intercept, and two other columns species_setosa and species_versicolor, which are as follows: 1 if setosa species_setosa = 0 if versicolor −1 if virginica

0 if setosa and species_versicolor = 1 if versicolor −1 if virginica

Display the covariance matrix. rm.Covariance ans=4×4 table

meas1 meas2 meas3 meas4

meas1 ________

meas2 ________

meas3 ________

meas4 ________

0.26501 0.092721 0.16751 0.038401

0.092721 0.11539 0.055244 0.03271

0.16751 0.055244 0.18519 0.042665

0.038401 0.03271 0.042665 0.041882

Specify the Within-Subject Hypothesis Load the sample data. load('longitudinalData.mat');

The matrix Y contains response data for 16 individuals. The response is the blood level of a drug measured at five time points (time = 0, 2, 4, 6, and 8). Each row of Y corresponds to an individual, and each column corresponds to a time point. The first eight subjects are female, and the second eight subjects are male. This is simulated data. 32-1980

fitrm

Define a variable that stores gender information. Gender = ['F' 'F' 'F' 'F' 'F' 'F' 'F' 'F' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M']';

Store the data in a proper table array format to conduct repeated measures analysis. t = table(Gender,Y(:,1),Y(:,2),Y(:,3),Y(:,4),Y(:,5),... 'VariableNames',{'Gender','t0','t2','t4','t6','t8'});

Define the within-subjects variable. Time = [0 2 4 6 8]';

Fit a repeated measures model, where blood levels are the responses and gender is the predictor variable. Also define the hypothesis for within-subject factors. rm = fitrm(t,'t0-t8 ~ Gender','WithinDesign',Time,'WithinModel','orthogonalcontrasts') rm = RepeatedMeasuresModel with properties: Between Subjects: BetweenDesign: ResponseNames: BetweenFactorNames: BetweenModel:

[16x6 table] {'t0' 't2' 't4' {'Gender'} '1 + Gender'

't6'

't8'}

Within Subjects: WithinDesign: [5x1 table] WithinFactorNames: {'Time'} WithinModel: 'orthogonalcontrasts' Estimates: Coefficients: [2x5 table] Covariance: [5x5 table]

Fit a Model with Covariates Load the sample data. load repeatedmeas

The table between includes the eight repeated measurements, y1 through y8, as responses and the between-subject factors Group, Gender, IQ, and Age. IQ and Age as continuous variables. The table within includes the within-subject factors w1 and w2. Fit a repeated measures model, where age, IQ, group, and gender are the predictor variables, and the model includes the interaction effect of group and gender. Also define the within-subject factors. rm = fitrm(between,'y1-y8 ~ Group*Gender+Age+IQ','WithinDesign',within) rm = RepeatedMeasuresModel with properties: Between Subjects:

32-1981

32

Functions

BetweenDesign: ResponseNames: BetweenFactorNames: BetweenModel:

[30x12 table] {'y1' 'y2' 'y3' 'y4' 'y5' 'y6' {'Age' 'IQ' 'Group' 'Gender'} '1 + Age + IQ + Group*Gender'

'y7'

'y8'}

Within Subjects: WithinDesign: [8x2 table] WithinFactorNames: {'w1' 'w2'} WithinModel: 'separatemeans' Estimates: Coefficients: [8x8 table] Covariance: [8x8 table]

Display the coefficients. rm.Coefficients ans=8×8 table

(Intercept) Age IQ Group_A Group_B Gender_Female Group_A:Gender_Female Group_B:Gender_Female

y1 ________

y2 _______

y3 _______

y4 _______

y5 _________

y6 ________

141.38 0.32042 -1.2671 -1.2195 2.5186 5.3957 4.1046 -0.48486

195.25 -4.7672 -1.1653 -9.6186 1.417 -3.9719 10.064 -2.9202

9.8663 -1.2748 0.05862 22.532 -2.2501 8.5225 -7.3053 1.1222

-49.154 0.6216 0.4288 15.303 0.50181 9.3403 -3.3085 0.69715

157.77 -1.0621 -1.4518 12.602 8.0907 6.0909 4.6751 -0.065945

0.23762 0.89927 -0.25501 12.886 3.1957 1.642 2.4907 0.079468

The display shows the coefficients for fitting the repeated measures as a function of the terms in the between-subjects model.

Input Arguments t — Input data table Input data, which includes the values of the response variables and the between-subject factors to use as predictors in the repeated measures model, specified as a table. The variable names in t must be valid MATLAB identifiers. You can verify the variable names by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,t.Properties.VariableNames)

If the variable names in t are not valid, then convert them by using the matlab.lang.makeValidName function. t.Properties.VariableNames = matlab.lang.makeValidName(t.Properties.VariableNames);

Data Types: table 32-1982

fitrm

modelspec — Formula for model specification character vector or string scalar of the form 'y1-yk ~ terms' Formula for model specification, specified as a character vector or string scalar of the form 'y1-yk ~ terms'. The responses and terms are specified using Wilkinson notation on page 32-1984. fitrm treats the variables used in model terms as categorical if they are categorical (nominal or ordinal), logical, character arrays, string arrays, or cell arrays of character vectors. For example, if you have four repeated measures as responses and the factors x1, x2, and x3 as the predictor variables, then you can define a repeated measures model as follows. Example: 'y1-y4 ~ x1 + x2 * x3' Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'WithinDesign','W','WithinModel','w1+w2' specifies the matrix w as the design matrix for within-subject factors, and the model for within-subject factors w1 and w2 is 'w1+w2'. WithinDesign — Design for within-subject factors numeric vector of length r (default) | r-by-k numeric matrix | r-by-k table Design for within-subject factors, specified as the comma-separated pair consisting of 'WithinDesign' and one of the following: • Numeric vector of length r, where r is the number of repeated measures. In this case, fitrm treats the values in the vector as continuous, and these are typically time values. • r-by-k numeric matrix of the values of the k within-subject factors, w1, w2, ..., wk. In this case,fitrm treats all k variables as continuous. • r-by-k table that contains the values of the k within-subject factors. In this case, fitrm treats all numeric variables as continuous, and all categorical variables as categorical. For example, if the table weeks contains the values of the within-subject factors, then you can define the design table as follows. Example: 'WithinDesign',weeks Data Types: single | double | table WithinModel — Model specifying within-subject hypothesis test 'separatemeans' (default) | 'orthogonalcontrasts' | character vector or string scalar that defines a model Model specifying the within-subject hypothesis test, specified as the comma-separated pair consisting of 'WithinModel' and one of the following: • 'separatemeans' — Compute a separate mean for each group. 32-1983

32

Functions

• 'orthogonalcontrasts' — This is valid only when the within-subject model has a single numeric factor T. Responses are the average, the slope of centered T, and, in general, all orthogonal contrasts for a polynomial up to T^(p – 1), where p is the number if rows in the withinsubject model. • A character vector or string scalar that defines a model specification in the within-subject factors. You can define the model based on the rules for the terms in modelspec. For example, if there are three within-subject factors w1, w2, and w3, then you can specify a model for the within-subject factors as follows. Example: 'WithinModel','w1+w2+w2*w3' Data Types: char | string

Output Arguments rm — Repeated measures model RepeatedMeasuresModel object Repeated measures model, returned as a RepeatedMeasuresModel object. For properties and methods of this object, see RepeatedMeasuresModel.

More About Model Specification Using Wilkinson Notation Wilkinson notation describes the factors present in models. It does not describe the multipliers (coefficients) of those factors. The following rules specify the responses in modelspec. Wilkinson Notation

Meaning

Y1,Y2,Y3

Specific list of variables

Y1-Y5

All table variables from Y1 through Y5

The following rules specify terms in modelspec.

32-1984

Wilkinson notation

Factors in Standard Notation

1

Constant (intercept) term

X^k, where k is a positive integer

X, X2, ..., Xk

X1 + X2

X1, X2

X1*X2

X1, X2, X1*X2

X1:X2

X1*X2 only

-X2

Do not include X2

X1*X2 + X3

X1, X2, X3, X1*X2

X1 + X2 + X3 + X1:X2

X1, X2, X3, X1*X2

X1*X2*X3 - X1:X2:X3

X1, X2, X3, X1*X2, X1*X3, X2*X3

fitrm

Wilkinson notation

Factors in Standard Notation

X1*(X2 + X3)

X1, X2, X3, X1*X2, X1*X3

Statistics and Machine Learning Toolbox notation always includes a constant term unless you explicitly remove the term using -1.

See Also RepeatedMeasuresModel Introduced in R2014a

32-1985

32

Functions

fitdist Fit probability distribution object to data

Syntax pd = fitdist(x,distname) pd = fitdist(x,distname,Name,Value) [pdca,gn,gl] = fitdist(x,distname,'By',groupvar) [pdca,gn,gl] = fitdist(x,distname,'By',groupvar,Name,Value)

Description pd = fitdist(x,distname) creates a probability distribution object by fitting the distribution specified by distname to the data in column vector x. pd = fitdist(x,distname,Name,Value) creates the probability distribution object with additional options specified by one or more name-value pair arguments. For example, you can indicate censored data or specify control parameters for the iterative fitting algorithm. [pdca,gn,gl] = fitdist(x,distname,'By',groupvar) creates probability distribution objects by fitting the distribution specified by distname to the data in x based on the grouping variable groupvar. It returns a cell array of fitted probability distribution objects, pdca, a cell array of group labels, gn, and a cell array of grouping variable levels, gl. [pdca,gn,gl] = fitdist(x,distname,'By',groupvar,Name,Value) returns the above output arguments using additional options specified by one or more name-value pair arguments. For example, you can indicate censored data or specify control parameters for the iterative fitting algorithm.

Examples Fit a Normal Distribution to Data Load the sample data. Create a vector containing the patients' weight data. load hospital x = hospital.Weight;

Create a normal distribution object by fitting it to the data. pd = fitdist(x,'Normal') pd = NormalDistribution Normal distribution mu = 154 [148.728, 159.272] sigma = 26.5714 [23.3299, 30.8674]

32-1986

fitdist

The intervals next to the parameter estimates are the 95% confidence intervals for the distribution parameters. Plot the pdf of the distribution. x_values = 50:1:250; y = pdf(pd,x_values); plot(x_values,y,'LineWidth',2)

Fit a Kernel Distribution to Data Load the sample data. Create a vector containing the patients' weight data. load hospital x = hospital.Weight;

Create a kernel distribution object by fitting it to the data. Use the Epanechnikov kernel function. pd = fitdist(x,'Kernel','Kernel','epanechnikov') pd = KernelDistribution Kernel = epanechnikov Bandwidth = 14.3792

32-1987

32

Functions

Support = unbounded

Plot the pdf of the distribution. x_values = 50:1:250; y = pdf(pd,x_values); plot(x_values,y)

Fit Normal Distributions to Grouped Data Load the sample data. Create a vector containing the patients' weight data. load hospital x = hospital.Weight;

Create normal distribution objects by fitting them to the data, grouped by patient gender. gender = hospital.Sex; [pdca,gn,gl] = fitdist(x,'Normal','By',gender) pdca=1×2 cell array {1x1 prob.NormalDistribution}

32-1988

{1x1 prob.NormalDistribution}

fitdist

gn = 2x1 cell {'Female'} {'Male' } gl = 2x1 cell {'Female'} {'Male' }

The cell array pdca contains two probability distribution objects, one for each gender group. The cell array gn contains two group labels. The cell array gl contains two group levels. View each distribution in the cell array pdca to compare the mean, mu, and the standard deviation, sigma, grouped by patient gender. female = pdca{1}

% Distribution for females

female = NormalDistribution Normal distribution mu = 130.472 [128.183, 132.76] sigma = 8.30339 [6.96947, 10.2736] male = pdca{2}

% Distribution for males

male = NormalDistribution Normal distribution mu = 180.532 [177.833, 183.231] sigma = 9.19322 [7.63933, 11.5466]

Compute the pdf of each distribution. x_values = 50:1:250; femalepdf = pdf(female,x_values); malepdf = pdf(male,x_values);

Plot the pdfs for a visual comparison of weight distribution by gender. figure plot(x_values,femalepdf,'LineWidth',2) hold on plot(x_values,malepdf,'Color','r','LineStyle',':','LineWidth',2) legend(gn,'Location','NorthEast') hold off

32-1989

32

Functions

Fit Kernel Distributions to Grouped Data Load the sample data. Create a vector containing the patients' weight data. load hospital x = hospital.Weight;

Create kernel distribution objects by fitting them to the data, grouped by patient gender. Use a triangular kernel function. gender = hospital.Sex; [pdca,gn,gl] = fitdist(x,'Kernel','By',gender,'Kernel','triangle');

View each distribution in the cell array pdca to see the kernel distributions for each gender. female = pdca{1}

% Distribution for females

female = KernelDistribution Kernel = triangle Bandwidth = 4.25894 Support = unbounded male = pdca{2}

32-1990

% Distribution for males

fitdist

male = KernelDistribution Kernel = triangle Bandwidth = 5.08961 Support = unbounded

Compute the pdf of each distribution. x_values = 50:1:250; femalepdf = pdf(female,x_values); malepdf = pdf(male,x_values);

Plot the pdfs for a visual comparison of weight distribution by gender. figure plot(x_values,femalepdf,'LineWidth',2) hold on plot(x_values,malepdf,'Color','r','LineStyle',':','LineWidth',2) legend(gn,'Location','NorthEast') hold off

Input Arguments x — Input data column vector 32-1991

32

Functions

Input data, specified as a column vector. fitdist ignores NaN values in x. Additionally, any NaN values in the censoring vector or frequency vector cause fitdist to ignore the corresponding values in x. Data Types: double distname — Distribution name character vector | string scalar Distribution name, specified as one of the following character vectors or string scalars. The distribution specified by distname determines the type of the returned probability distribution object.

32-1992

Distribution Name

Description

Distribution Object

'Beta'

Beta distribution

BetaDistribution

'Binomial'

Binomial distribution

BinomialDistribution

'BirnbaumSaunders'

Birnbaum-Saunders distribution BirnbaumSaundersDistribu tion

'Burr'

Burr distribution

BurrDistribution

'Exponential'

Exponential distribution

ExponentialDistribution

'ExtremeValue'

Extreme Value distribution

ExtremeValueDistribution

'Gamma'

Gamma distribution

GammaDistribution

'GeneralizedExtremeValue' Generalized Extreme Value distribution

GeneralizedExtremeValueD istribution

'GeneralizedPareto'

Generalized Pareto distribution

GeneralizedParetoDistrib ution

'HalfNormal'

Half-normal distribution

HalfNormalDistribution

'InverseGaussian'

Inverse Gaussian distribution

InverseGaussianDistribut ion

'Kernel'

Kernel distribution

KernelDistribution

'Logistic'

Logistic distribution

LogisticDistribution

'Loglogistic'

Loglogistic distribution

LoglogisticDistribution

'Lognormal'

Lognormal distribution

LognormalDistribution

'Nakagami'

Nakagami distribution

NakagamiDistribution

'NegativeBinomial'

Negative Binomial distribution

NegativeBinomialDistribu tion

'Normal'

Normal distribution

NormalDistribution

'Poisson'

Poisson distribution

PoissonDistribution

'Rayleigh'

Rayleigh distribution

RayleighDistribution

'Rician'

Rician distribution

RicianDistribution

'Stable'

Stable distribution

StableDistribution

'tLocationScale'

t Location-Scale distribution

tLocationScaleDistributi on

fitdist

Distribution Name

Description

Distribution Object

'Weibull'

Weibull distribution

WeibullDistribution

groupvar — Grouping variable categorical array | logical or numeric vector | character array | string array | cell array of character vectors Grouping variable, specified as a categorical array, logical or numeric vector, character array, string array, or cell array of character vectors. Each unique value in a grouping variable defines a group. For example, if Gender is a cell array of character vectors with values 'Male' and 'Female', you can use Gender as a grouping variable to fit a distribution to your data by gender. More than one grouping variable can be used by specifying a cell array of grouping variables. Observations are placed in the same group if they have common values of all specified grouping variables. For example, if Smoker is a logical vector with values 0 for nonsmokers and 1 for smokers, then specifying the cell array {Gender,Smoker} divides observations into four groups: Male Smoker, Male Nonsmoker, Female Smoker, and Female Nonsmoker. Example: {Gender,Smoker} Data Types: categorical | logical | single | double | char | string | cell Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: fitdist(x,'Kernel','Kernel','triangle') fits a kernel distribution object to the data in x using a triangular kernel function. Censoring — Logical flag for censored data 0 (default) | vector of logical values Logical flag for censored data, specified as the comma-separated pair consisting of 'Censoring' and a vector of logical values that is the same size as input vector x. The value is 1 when the corresponding element in x is a right-censored observation and 0 when the corresponding element is an exact observation. The default is a vector of 0s, indicating that all observations are exact. fitdist ignores any NaN values in this censoring vector. Additionally, any NaN values in x or the frequency vector cause fitdist to ignore the corresponding values in the censoring vector. This argument is valid only if distname is 'BirnbaumSaunders', 'Burr', 'Exponential', 'ExtremeValue', 'Gamma', 'InverseGaussian', 'Kernel', 'Logistic', 'Loglogistic', 'Lognormal', 'Nakagami', 'Normal', 'Rician', 'tLocationScale', or 'Weibull'. Data Types: logical Frequency — Observation frequency 1 (default) | vector of nonnegative integer values Observation frequency, specified as the comma-separated pair consisting of 'Frequency' and a vector of nonnegative integer values that is the same size as input vector x. Each element of the 32-1993

32

Functions

frequency vector specifies the frequencies for the corresponding elements in x. The default is a vector of 1s, indicating that each value in x only appears once. fitdist ignores any NaN values in this frequency vector. Additionally, any NaN values in x or the censoring vector cause fitdist to ignore the corresponding values in the frequency vector. Data Types: single | double Options — Control parameters structure Control parameters for the iterative fitting algorithm, specified as the comma-separated pair consisting of 'Options' and a structure you create using statset. Data Types: struct NTrials — Number of trials positive integer value Number of trials for the binomial distribution, specified as the comma-separated pair consisting of 'NTrials' and a positive integer value. You must specify distname as 'Binomial' to use this option. Data Types: single | double Theta — Threshold parameter 0 (default) | scalar value Threshold parameter for the generalized Pareto distribution, specified as the comma-separated pair consisting of 'Theta' and a scalar value. You must specify distname as 'GeneralizedPareto' to use this option. Data Types: single | double mu — Location parameter 0 (default) | scalar value Location parameter for the half-normal distribution, specified as the comma-separated pair consisting of 'mu' and a scalar value. You must specify distname as 'HalfNormal' to use this option. Data Types: single | double Kernel — Kernel smoother type 'normal' (default) | 'box' | 'triangle' | 'epanechnikov' Kernel smoother type, specified as the comma-separated pair consisting of 'Kernel' and one of the following: • 'normal' • 'box' • 'triangle' • 'epanechnikov' You must specify distname as 'Kernel' to use this option. Support — Kernel density support 'unbounded' (default) | 'positive' | two-element vector 32-1994

fitdist

Kernel density support, specified as the comma-separated pair consisting of 'Support' and 'unbounded', 'positive', or a two-element vector. 'unbounded'

Density can extend over the whole real line.

'positive'

Density is restricted to positive values.

Alternatively, you can specify a two-element vector giving finite lower and upper limits for the support of the density. You must specify distname as 'Kernel' to use this option. Data Types: single | double | char | string Width — Bandwidth of kernel smoothing window scalar value Bandwidth of the kernel smoothing window, specified as the comma-separated pair consisting of 'Width' and a scalar value. The default value used by fitdist is optimal for estimating normal densities, but you might want to choose a smaller value to reveal features such as multiple modes. You must specify distname as 'Kernel' to use this option. Data Types: single | double

Output Arguments pd — Probability distribution probability distribution object Probability distribution, returned as a probability distribution object. The distribution specified by distname determines the class type of the returned probability distribution object. pdca — Probability distribution objects cell array Probability distribution objects of the type specified by distname, returned as a cell array. gn — Group labels cell array of character vectors Group labels, returned as a cell array of character vectors. gl — Grouping variable levels cell array of character vectors Grouping variable levels, returned as a cell array of character vectors containing one column for each grouping variable.

Algorithms The fitdist function fits most distributions using maximum likelihood estimation. Two exceptions are the normal and lognormal distributions with uncensored data. • For the uncensored normal distribution, the estimated value of the sigma parameter is the square root of the unbiased estimate of the variance. 32-1995

32

Functions

• For the uncensored lognormal distribution, the estimated value of the sigma parameter is the square root of the unbiased estimate of the variance of the log of the data.

Alternative Functionality App The Distribution Fitter app opens a graphical user interface for you to import data from the workspace and interactively fit a probability distribution to that data. You can then save the distribution to the workspace as a probability distribution object. Open the Distribution Fitter app using distributionFitter, or click Distribution Fitter on the Apps tab.

References [1] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 1, Hoboken, NJ: Wiley-Interscience, 1993. [2] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 2, Hoboken, NJ: Wiley-Interscience, 1994. [3] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press, 1997.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • Supported syntaxes are: pd = fitdist(x,distname) pd = fitdist(x,distname,Name,Value)

Code generation does not support the syntaxes that include the grouping variable 'By',groupvar and the related output arguments pdca, gn, and gl. • fitdist supports code generation for beta, exponential, extreme value, lognormal, normal, and Weibull distributions. • The value of distname can be 'Beta', 'Exponential', 'ExtremeValue', 'Lognormal', 'Normal' or 'Weibull'. • The value of distname must be a compile-time constant. • The values of x, 'Censoring', and 'Frequency' must not contain NaN values. • Code generation ignores the 'Frequency' value for the beta distribution. Instead of specifying the 'Frequency' value, manually add duplicated values to x so that the values in x have the frequency you want. • Code generation does not support these input arguments: groupvar, NTrials, Theta, mu, Kernel, Support, and Width. • Names in name-value pair arguments must be compile-time constants. 32-1996

fitdist

• These object functions of pd support code generation: cdf, icdf, iqr, mean, median, pdf, std, truncate, and var. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “Code Generation for Probability Distribution Objects” on page 31-70.

See Also distributionFitter | histfit | makedist | mle | paramci Topics “Working with Probability Distributions” on page 5-2 “Supported Distributions” on page 5-13 Introduced in R2009a

32-1997

32

Functions

fitensemble Fit ensemble of learners for classification and regression

Syntax Mdl = fitensemble(Tbl,ResponseVarName,Method,NLearn,Learners) Mdl = fitensemble(Tbl,formula,Method,NLearn,Learners) Mdl = fitensemble(Tbl,Y,Method,NLearn,Learners) Mdl = fitensemble(X,Y,Method,NLearn,Learners) Mdl = fitensemble( ___ ,Name,Value)

Description fitensemble can boost or bag decision tree learners or discriminant analysis classifiers. The function can also train random subspace ensembles of KNN or discriminant analysis classifiers. For simpler interfaces that fit classification and regression ensembles, instead use fitcensemble and fitrensemble, respectively. Also, fitcensemble and fitrensemble provide options for Bayesian optimization. Mdl = fitensemble(Tbl,ResponseVarName,Method,NLearn,Learners) returns a trained ensemble model object that contains the results of fitting an ensemble of NLearn classification or regression learners (Learners) to all variables in the table Tbl. ResponseVarName is the name of the response variable in Tbl. Method is the ensemble-aggregation method. Mdl = fitensemble(Tbl,formula,Method,NLearn,Learners) fits the model specified by formula. Mdl = fitensemble(Tbl,Y,Method,NLearn,Learners) treats all variables in Tbl as predictor variables. Y is the response variable that is not in Tbl. Mdl = fitensemble(X,Y,Method,NLearn,Learners) trains an ensemble using the predictor data in X and response data in Y. Mdl = fitensemble( ___ ,Name,Value) trains an ensemble using additional options specified by one or more Name,Value pair arguments and any of the previous syntaxes. For example, you can specify the class order, to implement 10–fold cross-validation, or the learning rate.

Examples Estimate the Resubstitution Loss of a Boosting Ensemble Estimate the resubstitution loss of a trained, boosting classification ensemble of decision trees. Load the ionosphere data set. load ionosphere;

32-1998

fitensemble

Train a decision tree ensemble using AdaBoost, 100 learning cycles, and the entire data set. ClassTreeEns = fitensemble(X,Y,'AdaBoostM1',100,'Tree');

ClassTreeEns is a trained ClassificationEnsemble ensemble classifier. Determine the cumulative resubstitution losses (i.e., the cumulative misclassification error of the labels in the training data). rsLoss = resubLoss(ClassTreeEns,'Mode','Cumulative');

rsLoss is a 100-by-1 vector, where element k contains the resubstitution loss after the first k learning cycles. Plot the cumulative resubstitution loss over the number of learning cycles. plot(rsLoss); xlabel('Number of Learning Cycles'); ylabel('Resubstitution Loss');

In general, as the number of decision trees in the trained classification ensemble increases, the resubstitution loss decreases. A decrease in resubstitution loss might indicate that the software trained the ensemble sensibly. However, you cannot infer the predictive power of the ensemble by this decrease. To measure the predictive power of an ensemble, estimate the generalization error by: 32-1999

32

Functions

1

Randomly partitioning the data into training and cross-validation sets. Do this by specifying 'holdout',holdoutProportion when you train the ensemble using fitensemble.

2

Passing the trained ensemble to kfoldLoss, which estimates the generalization error.

Train Regression Ensemble Use a trained, boosted regression tree ensemble to predict the fuel economy of a car. Choose the number of cylinders, volume displaced by the cylinders, horsepower, and weight as predictors. Then, train an ensemble using fewer predictors and compare its in-sample predictive accuracy against the first ensemble. Load the carsmall data set. Store the training data in a table. load carsmall Tbl = table(Cylinders,Displacement,Horsepower,Weight,MPG);

Specify a regression tree template that uses surrogate splits to improve predictive accuracy in the presence of NaN values. t = templateTree('Surrogate','On');

Train the regression tree ensemble using LSBoost and 100 learning cycles. Mdl1 = fitensemble(Tbl,'MPG','LSBoost',100,t);

Mdl1 is a trained RegressionEnsemble regression ensemble. Because MPG is a variable in the MATLAB® Workspace, you can obtain the same result by entering Mdl1 = fitensemble(Tbl,MPG,'LSBoost',100,t); Use the trained regression ensemble to predict the fuel economy for a four-cylinder car with a 200cubic inch displacement, 150 horsepower, and weighing 3000 lbs. predMPG = predict(Mdl1,[4 200 150 3000]) predMPG = 22.8462

The average fuel economy of a car with these specifications is 21.78 mpg. Train a new ensemble using all predictors in Tbl except Displacement. formula = 'MPG ~ Cylinders + Horsepower + Weight'; Mdl2 = fitensemble(Tbl,formula,'LSBoost',100,t);

Compare the resubstitution MSEs between Mdl1 and Mdl2. mse1 = resubLoss(Mdl1) mse1 = 6.4721 mse2 = resubLoss(Mdl2) mse2 = 7.8599

The in-sample MSE for the ensemble that trains on all predictors is lower. 32-2000

fitensemble

Estimate the Generalization Error of a Boosting Ensemble Estimate the generalization error of a trained, boosting classification ensemble of decision trees. Load the ionosphere data set. load ionosphere;

Train a decision tree ensemble using AdaBoostM1, 100 learning cycles, and half of the data chosen randomly. The software validates the algorithm using the remaining half. rng(2); % For reproducibility ClassTreeEns = fitensemble(X,Y,'AdaBoostM1',100,'Tree',... 'Holdout',0.5);

ClassTreeEns is a trained ClassificationEnsemble ensemble classifier. Determine the cumulative generalization error, i.e., the cumulative misclassification error of the labels in the validation data). genError = kfoldLoss(ClassTreeEns,'Mode','Cumulative');

genError is a 100-by-1 vector, where element k contains the generalization error after the first k learning cycles. Plot the generalization error over the number of learning cycles. plot(genError); xlabel('Number of Learning Cycles'); ylabel('Generalization Error');

32-2001

32

Functions

The cumulative generalization error decreases to approximately 7% when 25 weak learners compose the ensemble classifier.

Find the Optimal Number of Splits and Trees for an Ensemble You can control the depth of the trees in an ensemble of decision trees. You can also control the tree depth in an ECOC model containing decision tree binary learners using the MaxNumSplits, MinLeafSize, or MinParentSize name-value pair parameters. • When bagging decision trees, fitensemble grows deep decision trees by default. You can grow shallower trees to reduce model complexity or computation time. • When boosting decision trees, fitensemble grows stumps (a tree with one split) by default. You can grow deeper trees for better accuracy. Load the carsmall data set. Specify the variables Acceleration, Displacement, Horsepower, and Weight as predictors, and MPG as the response. load carsmall X = [Acceleration Displacement Horsepower Weight]; Y = MPG;

The default values of the tree depth controllers for boosting regression trees are:

32-2002

fitensemble

• 1 for MaxNumSplits. This option grows stumps. • 5 for MinLeafSize • 10 for MinParentSize To search for the optimal number of splits: 1

Train a set of ensembles. Exponentially increase the maximum number of splits for subsequent ensembles from stump to at most n - 1 splits, where n is the training sample size. Also, decrease the learning rate for each ensemble from 1 to 0.1.

2

Cross validate the ensembles.

3

Estimate the cross-validated mean-squared error (MSE) for each ensemble.

4

Compare the cross-validated MSEs. The ensemble with the lowest one performs the best, and indicates the optimal maximum number of splits, number of trees, and learning rate for the data set.

Grow and cross validate a deep regression tree and a stump. Specify to use surrogate splits because the data contains missing values. These serve as benchmarks. MdlDeep = fitrtree(X,Y,'CrossVal','on','MergeLeaves','off',... 'MinParentSize',1,'Surrogate','on'); MdlStump = fitrtree(X,Y,'MaxNumSplits',1,'CrossVal','on','Surrogate','on');

Train the boosting ensembles using 150 regression trees. Cross validate the ensemble using 5-fold cross validation. Vary the maximum number of splits using the values in the sequence 0

1

m

m

{2 , 2 , . . . , 2 }, where m is such that 2 is no greater than n - 1, where n is the training sample size. For each variant, adjust the learning rate to each value in the set {0.1, 0.25, 0.5, 1}; n = size(X,1); m = floor(log2(n - 1)); lr = [0.1 0.25 0.5 1]; maxNumSplits = 2.^(0:m); numTrees = 150; Mdl = cell(numel(maxNumSplits),numel(lr)); rng(1); % For reproducibility for k = 1:numel(lr); for j = 1:numel(maxNumSplits); t = templateTree('MaxNumSplits',maxNumSplits(j),'Surrogate','on'); Mdl{j,k} = fitensemble(X,Y,'LSBoost',numTrees,t,... 'Type','regression','KFold',5,'LearnRate',lr(k)); end; end;

Compute the cross-validated MSE for each ensemble. kflAll = @(x)kfoldLoss(x,'Mode','cumulative'); errorCell = cellfun(kflAll,Mdl,'Uniform',false); error = reshape(cell2mat(errorCell),[numTrees numel(maxNumSplits) numel(lr)]); errorDeep = kfoldLoss(MdlDeep); errorStump = kfoldLoss(MdlStump);

Plot how the cross-validated MSE behaves as the number of trees in the ensemble increases for a few of the ensembles, the deep tree, and the stump. Plot the curves with respect to learning rate in the same plot, and plot separate plots for varying tree complexities. Choose a subset of tree complexity levels. 32-2003

32

Functions

mnsPlot = [1 round(numel(maxNumSplits)/2) numel(maxNumSplits)]; figure; for k = 1:3; subplot(2,2,k); plot(squeeze(error(:,mnsPlot(k),:)),'LineWidth',2); axis tight; hold on; h = gca; plot(h.XLim,[errorDeep errorDeep],'-.b','LineWidth',2); plot(h.XLim,[errorStump errorStump],'-.r','LineWidth',2); plot(h.XLim,min(min(error(:,mnsPlot(k),:))).*[1 1],'--k'); h.YLim = [10 50]; xlabel 'Number of trees'; ylabel 'Cross-validated MSE'; title(sprintf('MaxNumSplits = %0.3g', maxNumSplits(mnsPlot(k)))); hold off; end; hL = legend([cellstr(num2str(lr','Learning Rate = %0.2f'));... 'Deep Tree';'Stump';'Min. MSE']); hL.Position(1) = 0.6;

Each curve contains a minimum cross-validated MSE occurring at the optimal number of trees in the ensemble. Identify the maximum number of splits, number of trees, and learning rate that yields the lowest MSE overall.

32-2004

fitensemble

[minErr,minErrIdxLin] = min(error(:)); [idxNumTrees,idxMNS,idxLR] = ind2sub(size(error),minErrIdxLin); fprintf('\nMin. MSE = %0.5f',minErr) Min. MSE = 18.50574 fprintf('\nOptimal Parameter Values:\nNum. Trees = %d',idxNumTrees); Optimal Parameter Values: Num. Trees = 12 fprintf('\nMaxNumSplits = %d\nLearning Rate = %0.2f\n',... maxNumSplits(idxMNS),lr(idxLR)) MaxNumSplits = 4 Learning Rate = 0.25

For a different approach to optimizing this ensemble, see “Optimize a Boosted Regression Ensemble” on page 10-64.

Input Arguments Tbl — Sample data table Sample data used to train the model, specified as a table. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Tbl can contain one additional column for the response variable. Multi-column variables and cell arrays other than cell arrays of character vectors are not allowed. • If Tbl contains the response variable and you want to use all remaining variables as predictors, then specify the response variable using ResponseVarName. • If Tbl contains the response variable, and you want to use a subset of the remaining variables only as predictors, then specify a formula using formula. • If Tbl does not contain the response variable, then specify the response data using Y. The length of response variable and the number of rows of Tbl must be equal. Note To save memory and execution time, supply X and Y instead of Tbl. Data Types: table ResponseVarName — Response variable name name of response variable in Tbl Response variable name, specified as the name of the response variable in Tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if Tbl.Y is the response variable, then specify ResponseVarName as 'Y'. Otherwise, fitensemble treats all columns of Tbl as predictor variables.

32-2005

32

Functions

The response variable must be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array. For classification, you can specify the order of the classes using the ClassNames name-value pair argument. Otherwise, fitensemble determines the class order, and stores it in the Mdl.ClassNames. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form 'Y~X1+X2+X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables. To specify a subset of variables in Tbl as predictors for training the model, use a formula. If you specify a formula, then the software does not use any variables in Tbl that do not appear in formula. The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,Tbl.Properties.VariableNames)

If the variable names in Tbl are not valid, then convert them by using the matlab.lang.makeValidName function. Tbl.Properties.VariableNames = matlab.lang.makeValidName(Tbl.Properties.VariableNames);

Data Types: char | string X — Predictor data numeric matrix Predictor data, specified as numeric matrix. Each row corresponds to one observation, and each column corresponds to one predictor variable. The length of Y and the number of rows of X must be equal. To specify the names of the predictors in the order of their appearance in X, use the PredictorNames name-value pair argument. Data Types: single | double Y — Response data categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Response data, specified as a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. Each entry in Y is the response to or label for the observation in the corresponding row of X or Tbl. The length of Y and the number of rows of X or Tbl must be equal. If 32-2006

fitensemble

the response variable is a character array, then each element must correspond to one row of the array. • For classification, Y can be any of the supported data types. You can specify the order of the classes using the ClassNames name-value pair argument. Otherwise, fitensemble determines the class order, and stores it in the Mdl.ClassNames. • For regression, Y must be a numeric column vector. Data Types: categorical | char | string | logical | single | double | cell Method — Ensemble aggregation method 'AdaBoostM1' | 'LogitBoost' | 'GentleBoost' | 'RUSBoost' | 'Subspace' | 'Bag' | 'AdaBoostM2' | 'LSBoost' | ... Ensemble aggregation method, specified as one of the method names in this list. • For classification with two classes: • 'AdaBoostM1' • 'LogitBoost' • 'GentleBoost' • 'RobustBoost' (requires Optimization Toolbox) • 'LPBoost' (requires Optimization Toolbox) • 'TotalBoost' (requires Optimization Toolbox) • 'RUSBoost' • 'Subspace' • 'Bag' • For classification with three or more classes: • 'AdaBoostM2' • 'LPBoost' (requires Optimization Toolbox) • 'TotalBoost' (requires Optimization Toolbox) • 'RUSBoost' • 'Subspace' • 'Bag' • For regression: • 'LSBoost' • 'Bag' If you specify 'Method','Bag', then specify the problem type using the Type name-value pair argument, because you can specify 'Bag' for classification and regression problems. For details about ensemble aggregation algorithms and examples, see “Ensemble Algorithms” on page 18-38 and “Choose an Applicable Ensemble Aggregation Method” on page 18-31. NLearn — Number of ensemble learning cycles positive integer | 'AllPredictorCombinations' 32-2007

32

Functions

Number of ensemble learning cycles, specified as a positive integer or 'AllPredictorCombinations'. • If you specify a positive integer, then, at every learning cycle, the software trains one weak learner for every template object in Learners. Consequently, the software trains NLearn*numel(Learners) learners. • If you specify 'AllPredictorCombinations', then set Method to 'Subspace' and specify one learner only in Learners. With these settings, the software trains learners for all possible combinations of predictors taken NPredToSample at a time. Consequently, the software trains nchoosek(size(X,2),NPredToSample) learners. The software composes the ensemble using all trained learners and stores them in Mdl.Trained. For more details, see “Tips” on page 32-2018. Data Types: single | double | char | string Learners — Weak learners to use in ensemble weak-learner name | weak-learner template object | cell vector of weak-learner template objects Weak learners to use in the ensemble, specified as a weak-learner name, weak-learner template object, or cell array of weak-learner template objects. Weak Learner

Weak-Learner Name

Template Object Creation Function

Method Settings

Discriminant analysis

'Discriminant'

templateDiscrimina Recommended for nt 'Subspace'

k nearest neighbors

'KNN'

templateKNN

For 'Subspace' only

Decision tree

'Tree'

templateTree

All methods except 'Subspace'

For more details, see NLearn and “Tips” on page 32-2018. Example: For an ensemble composed of two types of classification trees, supply {t1 t2}, where t1 and t2 are classification tree templates. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'CrossVal','on','LearnRate',0.05 specifies to implement 10-fold cross-validation and to use 0.05 as the learning rate. General Ensemble Options

CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' Categorical predictors list, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the values in this table. 32-2008

fitensemble

Value

Description

Vector of positive integers

Each entry in the vector is an index value corresponding to the column of the predictor data (X or Tbl) that contains a categorical variable.

Logical vector

A true entry means that the corresponding column of predictor data (X or Tbl) is a categorical variable.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames.

'all'

All predictors are categorical.

Specification of 'CategoricalPredictors' is appropriate if: • 'Learners' specifies tree learners. • 'Learners' specifies k-nearest learners where all predictors are categorical. Each learner identifies and treats categorical predictors in the same way as the fitting function corresponding to the learner. See 'CategoricalPredictors' of fitcknn for k-nearest learners and 'CategoricalPredictors' of fitctree for tree learners. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell NPrint — Printout frequency 'off' (default) | positive integer Printout frequency, specified as the comma-separated pair consisting of 'NPrint' and a positive integer or 'off'. To track the number of weak learners or folds that fitensemble trained so far, specify a positive integer. That is, if you specify the positive integer m: • Without also specifying any cross-validation option (for example, CrossVal), then fitensemble displays a message to the command line every time it completes training m weak learners. • And a cross-validation option, then fitensemble displays a message to the command line every time it finishes training m folds. If you specify 'off', then fitensemble does not display a message when it completes training weak learners. Tip When training an ensemble of many weak learners on a large data set, specify a positive integer for NPrint. Example: 'NPrint',5 Data Types: single | double | char | string PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors 32-2009

32

Functions

Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on the way you supply the training data. • If you supply X and Y, then you can use 'PredictorNames' to give the predictor variables in X names. • The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal. • By default, PredictorNames is {'x1','x2',...}. • If you supply Tbl, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitensemble uses only the predictor variables in PredictorNames and the response variable in training. • PredictorNames must be a subset of Tbl.Properties.VariableNames and cannot include the name of the response variable. • By default, PredictorNames contains the names of all predictor variables. • A good practice is to specify the predictors for training using either 'PredictorNames' or formula only. Example: 'PredictorNames', {'SepalLength','SepalWidth','PetalLength','PetalWidth'} Data Types: string | cell ResponseName — Response variable name 'Y' (default) | character vector | string scalar Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector or string scalar. • If you supply Y, then you can use 'ResponseName' to specify a name for the response variable. • If you supply ResponseVarName or formula, then you cannot use 'ResponseName'. Example: 'ResponseName','response' Data Types: char | string Type — Supervised learning type 'classification' | 'regression' Supervised learning type, specified as the comma-separated pair consisting of 'Type' and 'classification' or 'regression'. • If Method is 'bag', then the supervised learning type is ambiguous. Therefore, specify Type when bagging. • Otherwise, the value of Method determines the supervised learning type. Example: 'Type','classification' Cross-Validation Options

CrossVal — Cross-validation flag 'off' (default) | 'on' 32-2010

fitensemble

Cross-validation flag, specified as the comma-separated pair consisting of 'Crossval' and 'on' or 'off'. If you specify 'on', then the software implements 10-fold cross-validation. To override this cross-validation setting, use one of these name-value pair arguments: CVPartition, Holdout, KFold, or Leaveout. To create a cross-validated model, you can use one cross-validation name-value pair argument at a time only. Alternatively, cross-validate later by passing Mdl to crossval or crossval. Example: 'Crossval','on' CVPartition — Cross-validation partition [] (default) | cvpartition partition object Cross-validation partition, specified as the comma-separated pair consisting of 'CVPartition' and a cvpartition partition object created by cvpartition. The partition object specifies the type of cross-validation and the indexing for the training and validation sets. To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) Fraction of the data used for holdout validation, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). If you specify 'Holdout',p, then the software completes these steps: 1

Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

2

Store the compact, trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use in a cross-validated model, specified as the comma-separated pair consisting of 'KFold' and a positive integer value greater than 1. If you specify 'KFold',k, then the software completes these steps: 1

Randomly partition the data into k sets.

2

For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

3

Store the k compact, trained models in the cells of a k-by-1 cell vector in the Trained property of the cross-validated model. 32-2011

32

Functions

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'KFold',5 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. If you specify 'Leaveout','on', then, for each of the n observations (where n is the number of observations excluding missing observations, specified in the NumObservations property of the model), the software completes these steps: 1

Reserve the observation as validation data, and train the model using the other n – 1 observations.

2

Store the n compact, trained models in the cells of an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can use one of these four name-value pair arguments only: CVPartition, Holdout, KFold, or Leaveout. Example: 'Leaveout','on' Other Classification or Regression Options

ClassNames — Names of classes to use for training categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of classes to use for training, specified as the comma-separated pair consisting of 'ClassNames' and a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. ClassNames must have the same data type as Y. If ClassNames is a character array, then each element must correspond to one row of the array. Use 'ClassNames' to: • Order the classes during training. • Specify the order of any input or output argument dimension that corresponds to the class order. For example, use 'ClassNames' to specify the order of the dimensions of Cost or the column order of classification scores returned by predict. • Select a subset of classes for training. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To train the model using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}. The default value for ClassNames is the set of all distinct class names in Y. Example: 'ClassNames',{'b','g'} Data Types: categorical | char | string | logical | single | double | cell Cost — Misclassification cost square matrix | structure array 32-2012

fitensemble

Misclassification cost, specified as the comma-separated pair consisting of 'Cost' and a square matrix or structure. If you specify: • The square matrix Cost, then Cost(i,j) is the cost of classifying a point into class j if its true class is i. That is, the rows correspond to the true class and the columns correspond to the predicted class. To specify the class order for the corresponding rows and columns of Cost, also specify the ClassNames name-value pair argument. • The structure S, then it must have two fields: • S.ClassNames, which contains the class names as a variable of the same data type as Y • S.ClassificationCosts, which contains the cost matrix with rows and columns ordered as in S.ClassNames The default is ones(K) - eye(K), where K is the number of distinct classes. Note fitensemble uses Cost to adjust the prior class probabilities specified in Prior. Then, fitensemble uses the adjusted prior probabilities for training and resets the cost matrix to its default. Example: 'Cost',[0 1 2 ; 1 0 2; 2 2 0] Data Types: double | single | struct Prior — Prior probabilities 'empirical' (default) | 'uniform' | numeric vector | structure array Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and a value in this table. Value

Description

'empirical'

The class prior probabilities are the class relative frequencies in Y.

'uniform'

All class prior probabilities are equal to 1/K, where K is the number of classes.

numeric vector

Each element is a class prior probability. Order the elements according to Mdl.ClassNames or specify the order using the ClassNames namevalue pair argument. The software normalizes the elements such that they sum to 1.

structure array

A structure S with two fields: • S.ClassNames contains the class names as a variable of the same type as Y. • S.ClassProbs contains a vector of corresponding prior probabilities. The software normalizes the elements such that they sum to 1.

fitensemble normalizes the prior probabilities in Prior to sum to 1. 32-2013

32

Functions

Example: struct('ClassNames', {{'setosa','versicolor','virginica'}},'ClassProbs',1:3) Data Types: char | string | double | single | struct Weights — Observation weights numeric vector of positive values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values or name of a variable in Tbl. The software weighs the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows of X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weights vector W is stored as Tbl.W, then specify it as 'W'. Otherwise, the software treats all columns of Tbl, including W, as predictors or the response when training the model. The software normalizes Weights to sum up to the value of the prior probability in the respective class. By default, Weights is ones(n,1), where n is the number of observations in X or Tbl. Data Types: double | single | char | string Sampling Options for Boosting Methods and Bagging

FResample — Fraction of training set to resample 1 (default) | positive scalar in (0,1] Fraction of the training set to resample for every weak learner, specified as the comma-separated pair consisting of 'FResample' and a positive scalar in (0,1]. To use 'FResample', specify 'bag' for Method or set Resample to 'on'. Example: 'FResample',0.75 Data Types: single | double Replace — Flag indicating to sample with replacement 'on' (default) | 'off' Flag indicating sampling with replacement, specified as the comma-separated pair consisting of 'Replace' and 'off' or 'on'. • For 'on', the software samples the training observations with replacement. • For 'off', the software samples the training observations without replacement. If you set Resample to 'on', then the software samples training observations assuming uniform weights. If you also specify a boosting method, then the software boosts by reweighting observations. Unless you set Method to 'bag' or set Resample to 'on', Replace has no effect. Example: 'Replace','off' Resample — Flag indicating to resample 'off' | 'on' 32-2014

fitensemble

Flag indicating to resample, specified as the comma-separated pair consisting of 'Resample' and 'off' or 'on'. • If Method is a boosting method, then: • 'Resample','on' specifies to sample training observations using updated weights as the multinomial sampling probabilities. • 'Resample','off'(default) specifies to reweight observations at every learning iteration. • If Method is 'bag', then 'Resample' must be 'on'. The software resamples a fraction of the training observations (see FResample) with or without replacement (see Replace). If you specify to resample using Resample, then it is good practice to resample to entire data set. That is, use the default setting of 1 for FResample. AdaBoostM1, AdaBoostM2, LogitBoost, GentleBoost, and LSBoost Method Options

LearnRate — Learning rate for shrinkage 1 (default) | numeric scalar in (0,1] Learning rate for shrinkage, specified as the comma-separated pair consisting of a numeric scalar in the interval (0,1]. To train an ensemble using shrinkage, set LearnRate to a value less than 1, for example, 0.1 is a popular choice. Training an ensemble using shrinkage requires more learning iterations, but often achieves better accuracy. Example: 'LearnRate',0.1 Data Types: single | double RUSBoost Method Options

LearnRate — Learning rate for shrinkage 1 (default) | numeric scalar in (0,1] Learning rate for shrinkage, specified as the comma-separated pair consisting of a numeric scalar in the interval (0,1]. To train an ensemble using shrinkage, set LearnRate to a value less than 1, for example, 0.1 is a popular choice. Training an ensemble using shrinkage requires more learning iterations, but often achieves better accuracy. Example: 'LearnRate',0.1 Data Types: single | double RatioToSmallest — Sampling proportion with respect to lowest-represented class positive numeric scalar | numeric vector of positive values Sampling proportion with respect to the lowest-represented class, specified as the comma-separated pair consisting of 'RatioToSmallest' and a numeric scalar or numeric vector of positive values with length equal to the number of distinct classes in the training data. Suppose that there are K classes in the training data and the lowest-represented class has m observations in the training data. 32-2015

32

Functions

• If you specify the positive numeric scalar s, then fitensemble samples s*m observations from each class, that is, it uses the same sampling proportion for each class. For more details, see “Algorithms” on page 32-2019. • If you specify the numeric vector [s1,s2,...,sK], then fitensemble samples si*m observations from class i, i = 1,...,K. The elements of RatioToSmallest correspond to the order of the class names specified using ClassNames (see “Tips” on page 32-2018). The default value is ones(K,1), which specifies to sample m observations from each class. Example: 'RatioToSmallest',[2,1] Data Types: single | double LPBoost and TotalBoost Method Options

MarginPrecision — Margin precision to control convergence speed 0.1 (default) | numeric scalar in [0,1] Margin precision to control convergence speed, specified as the comma-separated pair consisting of 'MarginPrecision' and a numeric scalar in the interval [0,1]. MarginPrecision affects the number of boosting iterations required for convergence. Tip To train an ensemble using many learners, specify a small value for MarginPrecision. For training using a few learners, specify a large value. Example: 'MarginPrecision',0.5 Data Types: single | double RobustBoost Method Options

RobustErrorGoal — Target classification error 0.1 (default) | nonnegative numeric scalar Target classification error, specified as the comma-separated pair consisting of 'RobustErrorGoal' and a nonnegative numeric scalar. The upper bound on possible values depends on the values of RobustMarginSigma and RobustMaxMargin. However, the upper bound cannot exceed 1. Tip For a particular training set, usually there is an optimal range for RobustErrorGoal. If you set it too low or too high, then the software can produce a model with poor classification accuracy. Try cross-validating to search for the appropriate value. Example: 'RobustErrorGoal',0.05 Data Types: single | double RobustMarginSigma — Classification margin distribution spread 0.1 (default) | positive numeric scalar Classification margin distribution spread over the training data, specified as the comma-separated pair consisting of 'RobustMarginSigma' and a positive numeric scalar. Before specifying RobustMarginSigma, consult the literature on RobustBoost, for example, [19]. Example: 'RobustMarginSigma',0.5 32-2016

fitensemble

Data Types: single | double RobustMaxMargin — Maximal classification margin 0 (default) | nonnegative numeric scalar Maximal classification margin in the training data, specified as the comma-separated pair consisting of 'RobustMaxMargin' and a nonnegative numeric scalar. The software minimizes the number of observations in the training data having classification margins below RobustMaxMargin. Example: 'RobustMaxMargin',1 Data Types: single | double Random Subspace Method Options

NPredToSample — Number of predictors to sample 1 (default) | positive integer Number of predictors to sample for each random subspace learner, specified as the comma-separated pair consisting of 'NPredToSample' and a positive integer in the interval 1,...,p, where p is the number of predictor variables (size(X,2) or size(Tbl,2)). Data Types: single | double

Output Arguments Mdl — Trained ensemble model ClassificationBaggedEnsemble model object | ClassificationEnsemble model object | ClassificationPartitionedEnsemble cross-validated model object | RegressionBaggedEnsemble model object | RegressionEnsemble model object | RegressionPartitionedEnsemble cross-validated model object Trained ensemble model, returned as one of the model objects in this table. Model Object

Type Setting

Specify Any Cross- Method Setting Validation Options?

Resample Setting

ClassificationBa 'classification' No ggedEnsemble

'Bag'

'on'

ClassificationEn 'classification' No semble

Any ensembleaggregation method for classification

'off'

ClassificationPa 'classification' Yes rtitionedEnsembl e

Any classification ensembleaggregation method

'off' or 'on'

RegressionBagged 'regression' Ensemble

No

'Bag'

'on'

RegressionEnsemb 'regression' le

No

'LSBoost'

'off'

RegressionPartit 'regression' ionedEnsemble

Yes

'LSBoost' or 'Bag'

'off' or 'on'

32-2017

32

Functions

The name-value pair arguments that control cross-validation are CrossVal, Holdout, KFold, Leaveout, and CVPartition. To reference properties of Mdl, use dot notation. For example, to access or display the cell vector of weak learner model objects for an ensemble that has not been cross-validated, enter Mdl.Trained at the command line.

Tips • NLearn can vary from a few dozen to a few thousand. Usually, an ensemble with good predictive power requires from a few hundred to a few thousand weak learners. However, you do not have to train an ensemble for that many cycles at once. You can start by growing a few dozen learners, inspect the ensemble performance and then, if necessary, train more weak learners using resume for classification problems, or resume for regression problems. • Ensemble performance depends on the ensemble setting and the setting of the weak learners. That is, if you specify weak learners with default parameters, then the ensemble can perform poorly. Therefore, like ensemble settings, it is good practice to adjust the parameters of the weak learners using templates, and to choose values that minimize generalization error. • If you specify to resample using Resample, then it is good practice to resample to entire data set. That is, use the default setting of 1 for FResample. • In classification problems (that is, Type is 'classification'): • If the ensemble-aggregation method (Method) is 'bag' and: • The misclassification cost (Cost) is highly imbalanced, then, for in-bag samples, the software oversamples unique observations from the class that has a large penalty. • The class prior probabilities (Prior) are highly skewed, the software oversamples unique observations from the class that has a large prior probability. For smaller sample sizes, these combinations can result in a low relative frequency of out-ofbag observations from the class that has a large penalty or prior probability. Consequently, the estimated out-of-bag error is highly variable and it can be difficult to interpret. To avoid large estimated out-of-bag error variances, particularly for small sample sizes, set a more balanced misclassification cost matrix using Cost or a less skewed prior probability vector using Prior. • Because the order of some input and output arguments correspond to the distinct classes in the training data, it is good practice to specify the class order using the ClassNames namevalue pair argument. • To determine the class order quickly, remove all observations from the training data that are unclassified (that is, have a missing label), obtain and display an array of all the distinct classes, and then specify the array for ClassNames. For example, suppose the response variable (Y) is a cell array of labels. This code specifies the class order in the variable classNames. Ycat = categorical(Y); classNames = categories(Ycat)

categorical assigns to unclassified observations and categories excludes from its output. Therefore, if you use this code for cell arrays of labels or similar code for categorical arrays, then you do not have to remove observations with missing labels to obtain a list of the distinct classes. 32-2018

fitensemble

• To specify that the class order from lowest-represented label to most-represented, then quickly determine the class order (as in the previous bullet), but arrange the classes in the list by frequency before passing the list to ClassNames. Following from the previous example, this code specifies the class order from lowest- to most-represented in classNamesLH. Ycat = categorical(Y); classNames = categories(Ycat); freq = countcats(Ycat); [~,idx] = sort(freq); classNamesLH = classNames(idx);

Algorithms • For details of ensemble-aggregation algorithms, see “Ensemble Algorithms” on page 18-38. • If you specify Method to be a boosting algorithm and Learners to be decision trees, then the software grows stumps by default. A decision stump is one root node connected to two terminal, leaf nodes. You can adjust tree depth by specifying the MaxNumSplits, MinLeafSize, and MinParentSize name-value pair arguments using templateTree. • fitensemble generates in-bag samples by oversampling classes with large misclassification costs and undersampling classes with small misclassification costs. Consequently, out-of-bag samples have fewer observations from classes with large misclassification costs and more observations from classes with small misclassification costs. If you train a classification ensemble using a small data set and a highly skewed cost matrix, then the number of out-of-bag observations per class can be low. Therefore, the estimated out-of-bag error can have a large variance and can be difficult to interpret. The same phenomenon can occur for classes with large prior probabilities. • For the RUSBoost ensemble-aggregation method (Method), the name-value pair argument RatioToSmallest specifies the sampling proportion for each class with respect to the lowestrepresented class. For example, suppose that there are two classes in the training data: A and B. A have 100 observations and B have 10 observations. Also, suppose that the lowest-represented class has m observations in the training data. • If you set 'RatioToSmallest',2, then s*m = 2*10 = 20. Consequently, fitensemble trains every learner using 20 observations from class A and 20 observations from class B. If you set 'RatioToSmallest',[2 2], then you obtain the same result. • If you set 'RatioToSmallest',[2,1], then s1*m = 2*10 = 20 and s2*m = 1*10 = 10. Consequently, fitensemble trains every learner using 20 observations from class A and 10 observations from class B. • For ensembles of decision trees, and for dual-core systems and above, fitensemble parallelizes training using Intel Threading Building Blocks (TBB). For details on Intel TBB, see https:// software.intel.com/en-us/intel-tbb.

References [1] Breiman, L. “Bagging Predictors.” Machine Learning. Vol. 26, pp. 123–140, 1996. [2] Breiman, L. “Random Forests.” Machine Learning. Vol. 45, pp. 5–32, 2001. [3] Freund, Y. “A more robust boosting algorithm.” arXiv:0905.2138v1, 2009. 32-2019

32

Functions

[4] Freund, Y. and R. E. Schapire. “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting.” J. of Computer and System Sciences, Vol. 55, pp. 119–139, 1997. [5] Friedman, J. “Greedy function approximation: A gradient boosting machine.” Annals of Statistics, Vol. 29, No. 5, pp. 1189–1232, 2001. [6] Friedman, J., T. Hastie, and R. Tibshirani. “Additive logistic regression: A statistical view of boosting.” Annals of Statistics, Vol. 28, No. 2, pp. 337–407, 2000. [7] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning section edition, Springer, New York, 2008. [8] Ho, T. K. “The random subspace method for constructing decision forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 8, pp. 832–844, 1998. [9] Schapire, R. E., Y. Freund, P. Bartlett, and W.S. Lee. “Boosting the margin: A new explanation for the effectiveness of voting methods.” Annals of Statistics, Vol. 26, No. 5, pp. 1651–1686, 1998. [10] Seiffert, C., T. Khoshgoftaar, J. Hulse, and A. Napolitano. “RUSBoost: Improving clasification performance when training data is skewed.” 19th International Conference on Pattern Recognition, pp. 1–4, 2008. [11] Warmuth, M., J. Liao, and G. Ratsch. “Totally corrective boosting algorithms that maximize the margin.” Proc. 23rd Int’l. Conf. on Machine Learning, ACM, New York, pp. 1001–1008, 2006.

See Also ClassificationBaggedEnsemble | ClassificationEnsemble | ClassificationPartitionedEnsemble | RegressionBaggedEnsemble | RegressionEnsemble | RegressionPartitionedEnsemble | templateDiscriminant | templateKNN | templateTree Topics “Supervised Learning Workflow and Algorithms” on page 18-2 “Framework for Ensemble Learning” on page 18-30 Introduced in R2011a

32-2020

fitnlm

fitnlm Fit nonlinear regression model

Syntax mdl = fitnlm(tbl,modelfun,beta0) mdl = fitnlm(X,y,modelfun,beta0) mdl = fitnlm( ___ ,modelfun,beta0,Name,Value)

Description mdl = fitnlm(tbl,modelfun,beta0) fits the model specified by modelfun to variables in the table or dataset array tbl, and returns the nonlinear model mdl. fitnlm estimates model coefficients using an iterative procedure starting from the initial values in beta0. mdl = fitnlm(X,y,modelfun,beta0) fits a nonlinear regression model using the column vector y as a response variable and the columns of the matrix X as predictor variables. mdl = fitnlm( ___ ,modelfun,beta0,Name,Value) fits a nonlinear regression model with additional options specified by one or more Name,Value pair arguments.

Examples Nonlinear Model from Table Create a nonlinear model for auto mileage based on the carbig data. Load the data and create a nonlinear model. load carbig tbl = table(Horsepower,Weight,MPG); modelfun = @(b,x)b(1) + b(2)*x(:,1).^b(3) + ... b(4)*x(:,2).^b(5); beta0 = [-50 500 -1 500 -1]; mdl = fitnlm(tbl,modelfun,beta0) mdl = Nonlinear regression model: MPG ~ b1 + b2*Horsepower^b3 + b4*Weight^b5 Estimated Coefficients: Estimate SE ________ _______ b1 b2 b3 b4

-49.383 376.43 -0.78193 422.37

119.97 567.05 0.47168 776.02

tStat ________

pValue ________

-0.41164 0.66384 -1.6578 0.54428

0.68083 0.50719 0.098177 0.58656

32-2021

32

Functions

b5

-0.24127

0.48325

-0.49926

0.61788

Number of observations: 392, Error degrees of freedom: 387 Root Mean Squared Error: 3.96 R-Squared: 0.745, Adjusted R-Squared 0.743 F-statistic vs. constant model: 283, p-value = 1.79e-113

Nonlinear Model from Matrix Data Create a nonlinear model for auto mileage based on the carbig data. Load the data and create a nonlinear model. load carbig X = [Horsepower,Weight]; y = MPG; modelfun = @(b,x)b(1) + b(2)*x(:,1).^b(3) + ... b(4)*x(:,2).^b(5); beta0 = [-50 500 -1 500 -1]; mdl = fitnlm(X,y,modelfun,beta0) mdl = Nonlinear regression model: y ~ b1 + b2*x1^b3 + b4*x2^b5 Estimated Coefficients: Estimate SE ________ _______ b1 b2 b3 b4 b5

-49.383 376.43 -0.78193 422.37 -0.24127

119.97 567.05 0.47168 776.02 0.48325

tStat ________

pValue ________

-0.41164 0.66384 -1.6578 0.54428 -0.49926

0.68083 0.50719 0.098177 0.58656 0.61788

Number of observations: 392, Error degrees of freedom: 387 Root Mean Squared Error: 3.96 R-Squared: 0.745, Adjusted R-Squared 0.743 F-statistic vs. constant model: 283, p-value = 1.79e-113

Adjust Fitting Options in Nonlinear Model Create a nonlinear model for auto mileage based on the carbig data. Strive for more accuracy by lowering the TolFun option, and observe the iterations by setting the Display option. Load the data and create a nonlinear model. load carbig X = [Horsepower,Weight]; y = MPG; modelfun = @(b,x)b(1) + b(2)*x(:,1).^b(3) + ...

32-2022

fitnlm

b(4)*x(:,2).^b(5); beta0 = [-50 500 -1 500 -1];

Create options to lower TolFun and to report iterative display, and create a model using the options. opts = statset('Display','iter','TolFun',1e-10); mdl = fitnlm(X,y,modelfun,beta0,'Options',opts); Norm of Norm of Iteration SSE Gradient Step ----------------------------------------------------------0 1.82248e+06 1 678600 788810 1691.07 2 616716 6.12739e+06 45.4738 3 249831 3.9532e+06 293.557 4 17675 361544 369.284 5 11746.6 69670.5 169.079 6 7242.22 343738 394.822 7 6250.32 159719 452.941 8 6172.87 91622.9 268.674 9 6077 6957.44 100.208 10 6076.34 6370.39 88.1905 11 6075.75 5199.08 77.9694 12 6075.3 4646.61 69.764 13 6074.91 4235.96 62.9114 14 6074.55 3885.28 57.0647 15 6074.23 3571.1 52.0036 16 6073.93 3286.48 47.5795 17 6073.66 3028.34 43.6844 18 6073.4 2794.31 40.2352 19 6073.17 2582.15 37.1663 20 6072.95 2389.68 34.4243 21 6072.74 2214.84 31.965 22 6072.55 2055.78 29.7516 23 6072.37 1910.83 27.753 24 6072.21 1778.51 25.9428 25 6072.05 1657.5 24.2986 26 6071.9 1546.65 22.8011 27 6071.76 1444.93 21.4338 28 6071.63 1351.44 20.1822 29 6071.51 1265.39 19.0339 30 6071.39 1186.06 17.978 31 6071.28 1112.83 17.0052 32 6071.17 1045.13 16.107 33 6071.07 982.465 15.2762 34 6070.98 924.389 14.5063 35 6070.89 870.498 13.7916 36 6070.8 820.434 13.127 37 6070.72 773.872 12.5081 38 6070.64 730.521 11.9307 39 6070.57 690.117 11.3914 40 6070.5 652.422 10.887 41 6070.43 617.219 10.4144 42 6070.37 584.315 9.97115 43 6070.31 553.53 9.55489 44 6070.25 524.703 9.1635 45 6070.19 497.686 8.79506

32-2023

32

Functions

46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103

32-2024

6070.14 6070.08 6070.03 6069.99 6069.94 6069.9 6069.85 6069.81 6069.77 6069.74 6069.7 6069.66 6069.63 6069.6 6069.57 6069.54 6069.51 6069.48 6069.45 6069.43 6069.4 6069.38 6069.35 6069.33 6069.31 6069.29 6069.26 6069.24 6069.22 6069.21 6069.19 6069.17 6069.15 6069.14 6069.12 6069.1 6069.09 6069.07 6069.06 6069.05 6069.03 6069.02 6069.01 6068.99 6068.98 6068.97 6068.96 6068.95 6068.94 6068.93 6068.92 6068.91 6068.9 6068.89 6068.77 6068.71 6068.66 6068.63

472.345 448.557 426.21 405.201 385.435 366.825 349.293 332.764 317.171 302.452 288.55 275.411 262.986 251.23 240.1 229.558 219.567 210.094 201.108 192.578 184.479 176.785 169.472 162.518 155.903 149.608 143.615 137.907 132.468 127.283 122.339 117.623 113.123 108.827 104.725 100.806 97.0611 93.4814 90.0584 86.7841 83.6513 80.6528 77.7821 75.0327 72.399 69.8752 67.4561 65.1366 62.9123 60.7784 58.7308 56.7655 54.8787 4349.28 2416.27 1721.26 1228.78 884.002

8.44785 8.12028 7.81091 7.51845 7.2417 6.97956 6.73104 6.49523 6.27127 6.0584 5.85591 5.66315 5.47949 5.3044 5.13734 4.97784 4.82545 4.67977 4.5404 4.407 4.27923 4.15677 4.03935 3.9267 3.81855 3.71468 3.61486 3.5189 3.42658 3.33774 3.25221 3.16981 3.09041 3.01386 2.94002 2.86877 2.8 2.73358 2.66942 2.60741 2.54745 2.48947 2.43338 2.37908 2.32652 2.27561 2.22629 2.17849 2.13216 2.08723 2.04364 2.00135 1.9603 18.1917 14.4439 12.1305 10.289 8.82019

fitnlm

104 6068.6 639.615 7.62744 105 6068.58 464.84 6.64627 106 6068.56 338.878 5.82964 107 6068.55 247.508 5.14296 108 6068.54 180.878 4.56032 109 6068.53 132.084 4.06194 110 6068.52 96.2341 3.63255 111 6068.51 69.8362 3.26019 112 6068.51 50.3735 2.93541 113 6068.5 36.0205 2.65062 114 6068.5 25.4452 2.39969 115 6068.49 17.6693 2.17764 116 6068.49 1027.39 14.0164 117 6068.48 544.039 5.3137 118 6068.48 94.0576 2.86664 119 6068.48 113.636 3.73502 120 6068.48 0.518567 1.3705 121 6068.48 4.5944 0.91284 122 6068.48 1.56389 0.629322 123 6068.48 1.13809 0.432547 124 6068.48 0.295936 0.297509 Iterations terminated: relative change in SSE less than OPTIONS.TolFun

Specify Nonlinear Regression Using Model Name Syntax Specify a nonlinear regression model for estimation using a function handle or model syntax. Load sample data. S = load('reaction'); X = S.reactants; y = S.rate; beta0 = S.beta;

Use a function handle to specify the Hougen-Watson model for the rate data. mdl = fitnlm(X,y,@hougen,beta0) mdl = Nonlinear regression model: y ~ hougen(b,X) Estimated Coefficients: Estimate SE ________ ________ b1 b2 b3 b4 b5

1.2526 0.062776 0.040048 0.11242 1.1914

0.86701 0.043561 0.030885 0.075157 0.83671

tStat ______

pValue _______

1.4447 1.4411 1.2967 1.4957 1.4239

0.18654 0.18753 0.23089 0.17309 0.1923

Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 0.193

32-2025

32

Functions

R-Squared: 0.999, Adjusted R-Squared 0.998 F-statistic vs. zero model: 3.91e+03, p-value = 2.54e-13

Alternatively, you can use an expression to specify the Hougen-Watson model for the rate data. myfun = 'y~(b1*x2-x3/b5)/(1+b2*x1+b3*x2+b4*x3)'; mdl2 = fitnlm(X,y,myfun,beta0) mdl2 = Nonlinear regression model: y ~ (b1*x2 - x3/b5)/(1 + b2*x1 + b3*x2 + b4*x3) Estimated Coefficients: Estimate SE ________ ________ b1 b2 b3 b4 b5

1.2526 0.062776 0.040048 0.11242 1.1914

0.86701 0.043561 0.030885 0.075157 0.83671

tStat ______

pValue _______

1.4447 1.4411 1.2967 1.4957 1.4239

0.18654 0.18753 0.23089 0.17309 0.1923

Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 0.193 R-Squared: 0.999, Adjusted R-Squared 0.998 F-statistic vs. zero model: 3.91e+03, p-value = 2.54e-13

Estimate Nonlinear Regression Using Robust Fitting Options Generate sample data from the nonlinear regression model y = b1 + b2exp( − b3x) + ε, where b1, b2, and b3 are coefficients, and the error term is normally distributed with mean 0 and standard deviation 0.5. modelfun = @(b,x)(b(1)+b(2)*exp(-b(3)*x)); rng('default') % for reproducibility b = [1;3;2]; x = exprnd(2,100,1); y = modelfun(b,x) + normrnd(0,0.5,100,1);

Set robust fitting options. opts = statset('nlinfit'); opts.RobustWgtFun = 'bisquare';

Fit the nonlinear model using the robust fitting options. Here, use an expression to specify the model. b0 = [2;2;2]; modelstr = 'y ~ b1 + b2*exp(-b3*x)'; mdl = fitnlm(x,y,modelstr,b0,'Options',opts)

32-2026

fitnlm

mdl = Nonlinear regression model (robust fit): y ~ b1 + b2*exp( - b3*x) Estimated Coefficients: Estimate SE ________ _______ b1 b2 b3

1.0218 3.6619 2.9732

0.07202 0.25429 0.38496

tStat ______

pValue __________

14.188 14.401 7.7232

2.1344e-25 7.974e-26 1.0346e-11

Number of observations: 100, Error degrees of freedom: 97 Root Mean Squared Error: 0.501 R-Squared: 0.807, Adjusted R-Squared 0.803 F-statistic vs. constant model: 203, p-value = 2.34e-35

Fit Nonlinear Regression Model Using Weights Function Handle Load sample data. S = load('reaction'); X = S.reactants; y = S.rate; beta0 = S.beta;

Specify a function handle for observation weights. The function accepts the model fitted values as input, and returns a vector of weights. a = 1; b = 1; weights = @(yhat) 1./((a + b*abs(yhat)).^2);

Fit the Hougen-Watson model to the rate data using the specified observation weights function. mdl = fitnlm(X,y,@hougen,beta0,'Weights',weights) mdl = Nonlinear regression model: y ~ hougen(b,X) Estimated Coefficients: Estimate SE ________ ________ b1 b2 b3 b4 b5

0.83085 0.04095 0.025063 0.080053 1.8261

0.58224 0.029663 0.019673 0.057812 1.281

tStat ______

pValue _______

1.427 1.3805 1.274 1.3847 1.4256

0.19142 0.20477 0.23842 0.20353 0.19183

Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 0.037 R-Squared: 0.998, Adjusted R-Squared 0.998 F-statistic vs. zero model: 1.14e+03, p-value = 3.49e-11

32-2027

32

Functions

Nonlinear Regression Model Using Nonconstant Error Model Load sample data. S = load('reaction'); X = S.reactants; y = S.rate; beta0 = S.beta;

Fit the Hougen-Watson model to the rate data using the combined error variance model. mdl = fitnlm(X,y,@hougen,beta0,'ErrorModel','combined') mdl = Nonlinear regression model: y ~ hougen(b,X) Estimated Coefficients: Estimate SE ________ ________ b1 b2 b3 b4 b5

1.2526 0.062776 0.040048 0.11242 1.1914

0.86702 0.043561 0.030885 0.075158 0.83671

tStat ______

pValue _______

1.4447 1.4411 1.2967 1.4957 1.4239

0.18654 0.18753 0.23089 0.17309 0.1923

Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 1.27 R-Squared: 0.999, Adjusted R-Squared 0.998 F-statistic vs. zero model: 3.91e+03, p-value = 2.54e-13

Input Arguments tbl — Input data table | dataset array Input data including predictor and response variables, specified as a table or dataset array. The predictor variables and response variable must be numeric. • If you specify modelfun using a formula, the model specification in the formula specifies the predictor and response variables. • If you specify modelfun using a function handle, the last variable is the response variable and the others are the predictor variables, by default. You can set a different column as the response variable by using the ResponseVar name-value pair argument. To select a subset of the columns as predictors, use the PredictorVars name-value pair argument. The variable names in a table do not have to be valid MATLAB identifiers. However, if the names are not valid, you cannot specify modelfun using a formula. You can verify the variable names in tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. 32-2028

fitnlm

cellfun(@isvarname,tbl.Properties.VariableNames)

If the variable names in tbl are not valid, then convert them by using the matlab.lang.makeValidName function. tbl.Properties.VariableNames = matlab.lang.makeValidName(tbl.Properties.VariableNames);

Data Types: table X — Predictor variables matrix Predictor variables, specified as an n-by-p matrix, where n is the number of observations and p is the number of predictor variables. Each column of X represents one variable, and each row represents one observation. Data Types: single | double y — Response variable vector Response variable, specified as an n-by-1 vector, where n is the number of observations. Each entry in y is the response for the corresponding row of X. Data Types: single | double modelfun — Functional form of the model function handle | character vector or string scalar formula in the form 'y ~ f(b1,b2,...,bj,x1,x2,...,xk)' Functional form of the model, specified as either of the following. • Function handle @modelfun or @(b,x)modelfun, where • b is a coefficient vector with the same number of elements as beta0. • x is a matrix with the same number of columns as X or the number of predictor variable columns of tbl. modelfun(b,x) returns a column vector that contains the same number of rows as x. Each row of the vector is the result of evaluating modelfun on the corresponding row of x. In other words, modelfun is a vectorized function, one that operates on all data rows and returns all evaluations in one function call. modelfun should return real numbers to obtain meaningful coefficients. • Character vector or string scalar representing a formula in the form 'y ~ f(b1,b2,...,bj,x1,x2,...,xk)', where f represents a scalar function of the scalar coefficient variables b1,...,bj and the scalar data variables x1,...,xk. The variable names in the formula must be valid MATLAB identifiers. Data Types: function_handle | char | string beta0 — Coefficients numeric vector Coefficients for the nonlinear model, specified as a numeric vector. NonLinearModel starts its search for optimal coefficients from beta0. Data Types: single | double 32-2029

32

Functions

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'ErrorModel','combined','Exclude',2,'Options',opt specifies the error model as the combined model, excludes the second observation from the fit, and uses the options defined in the structure opt to control the iterative fitting procedure. CoefficientNames — Names of the model coefficients {'b1','b2',...,'bk'} (default) | string array | cell array of character vectors Names of the model coefficients, specified as a string array or cell array of character vectors. Data Types: string | cell ErrorModel — Form of the error variance model 'constant' (default) | 'proportional' | 'combined' Form of the error variance model, specified as one of the following. Each model defines the error using a standard mean-zero and unit-variance variable e in combination with independent components: the function value f, and one or two parameters a and b 'constant' (default)

y = f + ae

'proportional'

y = f + bf e

'combined'

y = f + a+b f e

The only allowed error model when using Weights is 'constant'. Note options.RobustWgtFun must have value [] when using an error model other than 'constant'. Example: 'ErrorModel','proportional' ErrorParameters — Initial estimates of the error model parameters numeric array Initial estimates of the error model parameters for the chosen ErrorModel, specified as a numeric array. Error Model

Parameters

Default Values

'constant'

a

1

'proportional'

b

1

'combined'

a, b

[1,1]

You can only use the 'constant' error model when using Weights. Note options.RobustWgtFun must have value [] when using an error model other than 'constant'. 32-2030

fitnlm

For example, if 'ErrorModel' has the value 'combined', you can specify the starting value 1 for a and the starting value 2 for b as follows. Example: 'ErrorParameters',[1,2] Data Types: single | double Exclude — Observations to exclude logical or numeric index vector Observations to exclude from the fit, specified as the comma-separated pair consisting of 'Exclude' and a logical or numeric index vector indicating which observations to exclude from the fit. For example, you can exclude observations 2 and 3 out of 6 using either of the following examples. Example: 'Exclude',[2,3] Example: 'Exclude',logical([0 1 1 0 0 0]) Data Types: single | double | logical Options — Options for controlling the iterative fitting procedure [ ] (default) | structure Options for controlling the iterative fitting procedure, specified as a structure created by statset. The relevant fields are the nonempty fields in the structure returned by the call statset('fitnlm'). Option

Meaning

Default

DerivStep

Relative difference used in finite difference derivative calculations. A positive scalar, or a vector of positive scalars the same size as the vector of parameters estimated by the Statistics and Machine Learning Toolbox function using the options structure.

eps^(1/3)

Display

Amount of information displayed by the fitting algorithm. 'off' • 'off' — Displays no information. • 'final' — Displays the final output. • 'iter' — Displays iterative output to the Command Window.

FunValCheck

Character vector or string scalar indicating to check for invalid values, such as NaN or Inf, from the model function.

'on'

MaxIter

Maximum number of iterations allowed. Positive integer. 200

RobustWgtFun

Weight function for robust fitting. Can also be a function [] handle that accepts a normalized residual as input and returns the robust weights as output. If you use a function handle, give a Tune constant. See “Robust Options” on page 32-2033

32-2031

32

Functions

Option

Meaning

Default

Tune

Tuning constant used in robust fitting to normalize the residuals before applying the weight function. A positive scalar. Required if the weight function is specified as a function handle.

See “Robust Options” on page 32-2033 for the default, which depends on RobustWgtFun.

TolFun

Termination tolerance for the objective function value. Positive scalar.

1e-8

TolX

Termination tolerance for the parameters. Positive scalar. 1e-8 Data Types: struct PredictorVars — Predictor variables string array | cell array of character vectors | logical or numeric index vector Predictor variables to use in the fit, specified as the comma-separated pair consisting of 'PredictorVars' and either a string array or cell array of character vectors of the variable names in the table or dataset array tbl, or a logical or numeric index vector indicating which columns are predictor variables. The string values or character vectors should be among the names in tbl, or the names you specify using the 'VarNames' name-value pair argument. The default is all variables in X, or all variables in tbl except for ResponseVar. For example, you can specify the second and third variables as the predictor variables using either of the following examples. Example: 'PredictorVars',[2,3] Example: 'PredictorVars',logical([0 1 1 0 0 0]) Data Types: single | double | logical | string | cell ResponseVar — Response variable last column of tbl (default) | variable name | logical or numeric index vector Response variable to use in the fit, specified as the comma-separated pair consisting of 'ResponseVar' and either a variable name in the table or dataset array tbl, or a logical or numeric index vector indicating which column is the response variable. If you specify a model, it specifies the response variable. Otherwise, when fitting a table or dataset array, 'ResponseVar' indicates which variable fitnlm should use as the response. For example, you can specify the fourth variable, say yield, as the response out of six variables, in one of the following ways. Example: 'ResponseVar','yield' Example: 'ResponseVar',[4] Example: 'ResponseVar',logical([0 0 0 1 0 0]) Data Types: single | double | logical | char | string VarNames — Names of variables {'x1','x2',...,'xn','y'} (default) | string array | cell array of character vectors

32-2032

fitnlm

Names of variables, specified as the comma-separated pair consisting of 'VarNames' and a string array or cell array of character vectors including the names for the columns of X first, and the name for the response variable y last. 'VarNames' is not applicable to variables in a table or dataset array, because those variables already have names. Example: 'VarNames',{'Horsepower','Acceleration','Model_Year','MPG'} Data Types: string | cell Weights — Observation weights ones(n,1) (default) | vector of nonnegative scalar values | function handle Observation weights, specified as a vector of nonnegative scalar values or function handle. • If you specify a vector, then it must have n elements, where n is the number of rows in tbl or y. • If you specify a function handle, then the function must accept a vector of predicted response values as input, and return a vector of real positive weights as output. Given weights, W, NonLinearModel estimates the error variance at observation i by MSE*(1/ W(i)), where MSE is the mean squared error. Data Types: single | double | function_handle

Output Arguments mdl — Nonlinear model NonLinearModel object Nonlinear model representing a least-squares fit of the response to the data, returned as a NonLinearModel object. If the Options structure contains a nonempty RobustWgtFun field, the model is not a least-squares fit, but uses the RobustWgtFun robust fitting function. For properties and methods of the nonlinear model object, mdl, see the NonLinearModel class page.

More About Robust Options Weight Function

Equation

Default Tuning Constant

'andrews'

w = (abs(r) 0, then xk < v is a worthwhile surrogate split for xj < u. Surrogate Decision Splits A surrogate decision split is an alternative to the optimal decision split at a given node in a decision tree. The optimal split is found by growing the tree; the surrogate split uses a similar or correlated predictor variable and split criterion. When the value of the optimal split predictor for an observation is missing, the observation is sent to the left or right child node using the best surrogate predictor. When the value of the best surrogate split predictor for the observation is also missing, the observation is sent to the left or right child node using the second-best surrogate predictor, and so on. Candidate splits are sorted in descending order by their predictive measure of association on page 32-2179.

Tip • By default, Prune is 'on'. However, this specification does not prune the regression tree. To prune a trained regression tree, pass the regression tree to prune. • After training a model, you can generate C/C++ code that predicts responses for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 31-2.

Algorithms Node Splitting Rules fitrtree uses these processes to determine how to split node t. • For standard CART (that is, if PredictorSelection is 'allpairs') and for all predictors xi, i = 1,...,p: 1

fitrtree computes the weighted mean squared error (MSE) of the responses in node t using εt =



j∈T

2

w j y j − yt .

wj is the weight of observation j, and T is the set of all observation indices in node t. If you do not specify Weights, then wj = 1/n, where n is the sample size. 2

32-2180

fitrtree estimates the probability that an observation is in node t using

fitrtree

PT =



j∈T

wj .

3

fitrtree sorts xi in ascending order. Each element of the sorted predictor is a splitting candidate or cut point. fitrtree records any indices corresponding to missing values in the set TU, which is the unsplit set.

4

fitrtree determines the best way to split node t using xi by maximizing the reduction in MSE (ΔI) over all splitting candidates. That is, for all splitting candidates in xi: a

fitrtree splits the observations in node t into left and right child nodes (tL and tR, respectively).

b

fitrtree computes ΔI. Suppose that for a particular splitting candidate, tL and tR contain observation indices in the sets TL and TR, respectively. • If xi does not contain any missing values, then the reduction in MSE for the current splitting candidate is

ΔI = P T εt − P TL εtL − P TR εtR . • If xi contains missing values, then, assuming that the observations are missing at random, the reduction in MSE is ΔIU = P T − TU εt − P TL εtL − P TR εtR . T – TU is the set of all observation indices in node t that are not missing. • If you use surrogate decision splits on page 32-2180, then:

c

i

fitrtree computes the predictive measures of association on page 32-2179 between the decision split xj < u and all possible decision splits xk < v, j ≠ k.

ii

fitrtree sorts the possible alternative decision splits in descending order by their predictive measure of association with the optimal split. The surrogate split is the decision split yielding the largest measure.

iii

fitrtree decides the child node assignments for observations with a missing value for xi using the surrogate split. If the surrogate predictor also contains a missing value, then fitrtree uses the decision split with the second largest measure, and so on, until there are no other surrogates. It is possible for fitrtree to split two different observations at node t using two different surrogate splits. For example, suppose the predictors x1 and x2 are the best and second best surrogates, respectively, for the predictor xi, i ∉ {1,2}, at node t. If observation m of predictor xi is missing (i.e., xmi is missing), but xm1 is not missing, then x1 is the surrogate predictor for observation xmi. If observations x(m + 1),i and x(m + 1),1 are missing, but x(m + 1),2 is not missing, then x2 is the surrogate predictor for observation m + 1.

iv

fitrtree uses the appropriate MSE reduction formula. That is, if fitrtree fails to assign all missing observations in node t to children nodes using surrogate splits, then the MSE reduction is ΔIU. Otherwise, fitrtree uses ΔI for the MSE reduction.

fitrtree chooses the candidate that yields the largest MSE reduction.

fitrtree splits the predictor variable at the cut point that maximizes the MSE reduction. • For the curvature test (that is, if PredictorSelection is 'curvature'): 32-2181

32

Functions

1

fitrtree computes the residuals rti = yti − y t for all observations in node t. 1 yt = ∑ w y , which is the weighted average of the responses in node t. The weights are ∑i wi i i ti the observation weights in Weights.

2

fitrtree assigns observations to one of two bins according to the sign of the corresponding residuals. Let zt be a nominal variable that contains the bin assignments for the observations in node t.

3

fitrtree conducts curvature tests on page 32-2178 between each predictor and zt. For regression trees, K = 2. • If all p-values are at least 0.05, then fitrtree does not split node t. • If there is a minimal p-value, then fitrtree chooses the corresponding predictor to split node t. • If more than one p-value is zero due to underflow, then fitrtree applies standard CART to the corresponding predictors to choose the split predictor.

4

If fitrtree chooses a split predictor, then it uses standard CART to choose the cut point (see step 4 in the standard CART process).

• For the interaction test (that is, if PredictorSelection is 'interaction-curvature' ): 1

For observations in node t, fitrtree conducts curvature tests on page 32-2178 between each predictor and the response and interaction tests on page 32-2179 between each pair of predictors and the response. • If all p-values are at least 0.05, then fitrtree does not split node t. • If there is a minimal p-value and it is the result of a curvature test, then fitrtree chooses the corresponding predictor to split node t. • If there is a minimal p-value and it is the result of an interaction test, then fitrtree chooses the split predictor using standard CART on the corresponding pair of predictors. • If more than one p-value is zero due to underflow, then fitrtree applies standard CART to the corresponding predictors to choose the split predictor.

2

If fitrtree chooses a split predictor, then it uses standard CART to choose the cut point (see step 4 in the standard CART process).

Tree Depth Control • If MergeLeaves is 'on' and PruneCriterion is 'mse' (which are the default values for these name-value pair arguments), then the software applies pruning only to the leaves and by using MSE. This specification amounts to merging leaves coming from the same parent node whose MSE is at most the sum of the MSE of its two leaves. • To accommodate MaxNumSplits, fitrtree splits all nodes in the current layer, and then counts the number of branch nodes. A layer is the set of nodes that are equidistant from the root node. If the number of branch nodes exceeds MaxNumSplits, fitrtree follows this procedure:

32-2182

1

Determine how many branch nodes in the current layer must be unsplit so that there are at most MaxNumSplits branch nodes.

2

Sort the branch nodes by their impurity gains.

3

Unsplit the number of least successful branches.

4

Return the decision tree grown so far.

fitrtree

This procedure produces maximally balanced trees. • The software splits branch nodes layer by layer until at least one of these events occurs: • There are MaxNumSplits branch nodes. • A proposed split causes the number of observations in at least one branch node to be fewer than MinParentSize. • A proposed split causes the number of observations in at least one leaf node to be fewer than MinLeafSize. • The algorithm cannot find a good split within a layer (i.e., the pruning criterion (see PruneCriterion), does not improve for all proposed splits in a layer). A special case is when all nodes are pure (i.e., all observations in the node have the same class). • For values 'curvature' or 'interaction-curvature' of PredictorSelection, all tests yield p-values greater than 0.05. MaxNumSplits and MinLeafSize do not affect splitting at their default values. Therefore, if you set 'MaxNumSplits', splitting might stop due to the value of MinParentSize, before MaxNumSplits splits occur. Parallelization For dual-core systems and above, fitrtree parallelizes training decision trees using Intel Threading Building Blocks (TBB). For details on Intel TBB, see https://software.intel.com/en-us/intel-tbb.

References [1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984. [2] Loh, W.Y. “Regression Trees with Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, Vol. 12, 2002, pp. 361–386. [3] Loh, W.Y. and Y.S. Shih. “Split Selection Methods for Classification Trees.” Statistica Sinica, Vol. 7, 1997, pp. 815–840.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. Usage notes and limitations: • Supported syntaxes are: • tree = fitrtree(Tbl,Y) • tree = fitrtree(X,Y) • tree = fitrtree(___,Name,Value) • [tree,FitInfo,HyperparameterOptimizationResults] = fitrtree(___,Name,Value) — fitrtree returns the additional output arguments FitInfo and HyperparameterOptimizationResults when you specify the 'OptimizeHyperparameters' name-value pair argument. 32-2183

32

Functions

tree is a CompactRegressionTree object; therefore, it does not include the data used in training the regression tree. • The FitInfo output argument is an empty structure array currently reserved for possible future use. • The HyperparameterOptimizationResults output argument is a BayesianOptimization object or a table of hyperparameters with associated values that describe the cross-validation optimization of hyperparameters. 'HyperparameterOptimizationResults' is nonempty when the 'OptimizeHyperparameters' name-value pair argument is nonempty at the time you create the model. The values in 'HyperparameterOptimizationResults' depend on the value you specify for the 'HyperparameterOptimizationOptions' name-value pair argument when you create the model. • If you specify 'bayesopt' (default), then HyperparameterOptimizationResults is an object of class BayesianOptimization. • If you specify 'gridsearch' or 'randomsearch', then HyperparameterOptimizationResults is a table of the hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst). • Supported name-value pair arguments are: • 'CategoricalPredictors' • 'HyperparameterOptimizationOptions' — For cross-validation, tall optimization supports only 'Holdout' validation. For example, you can specify fitrtree(X,Y,'OptimizeHyperparameters','auto','HyperparameterOptimizatio nOptions',struct('Holdout',0.2)). • 'MaxNumSplits' — For tall optimization, fitrtree searches among integers, log-scaled (by default) in the range [1,max(2,min(10000,NumObservations–1))]. • 'MergeLeaves' • 'MinLeafSize' — For tall optimization, fitrtree searches among integers, log-scaled (by default) in the range [1,max(2,floor(NumObservations/2))]. • 'MinParentSize' • 'NumVariablesToSample' — For tall optimization, fitrtree searches among integers in the range [1,max(2,NumPredictors)]. • 'OptimizeHyperparameters' • 'PredictorNames' • 'QuadraticErrorTolerance' • 'ResponseName' • 'ResponseTransform' • 'SplitCriterion' • 'Weights' • This additional name-value pair argument is specific to tall arrays: • 'MaxDepth' — A positive integer specifying the maximum depth of the output tree. Specify a value for this argument to return a tree that has fewer levels and requires fewer passes through the tall array to compute. Generally, the algorithm of fitrtree takes one pass 32-2184

fitrtree

through the data and an additional pass for each tree level. The function does not set a maximum tree depth, by default. For more information, see “Tall Arrays” (MATLAB). Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. To perform parallel hyperparameter optimization, use the 'HyperparameterOptions', struct('UseParallel',true) name-value pair argument in the call to this function. For more information on parallel hyperparameter optimization, see “Parallel Bayesian Optimization” on page 10-7. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also RegressionPartitionedModel | RegressionTree | predict | prune | surrogateAssociation Topics “Splitting Categorical Predictors in Classification Trees” on page 19-25 Introduced in R2014a

32-2185

32

Functions

fitSVMPosterior Fit posterior probabilities

Syntax ScoreSVMModel = fitSVMPosterior(SVMModel) ScoreSVMModel = fitSVMPosterior(SVMModel,TBL,ResponseVarName) ScoreSVMModel = fitSVMPosterior(SVMModel,TBL,Y) ScoreSVMModel = fitSVMPosterior(SVMModel,X,Y) ScoreSVMModel = fitSVMPosterior( ___ ,Name,Value) [ScoreSVMModel,ScoreTransform] = fitSVMPosterior( ___ )

Description ScoreSVMModel = fitSVMPosterior(SVMModel) returns ScoreSVMModel, which is a trained, support vector machine (SVM) classifier containing the optimal score-to-posterior-probability transformation function for two-class learning. The software fits the appropriate score-to-posterior-probability transformation function using the SVM classifier SVMModel, and by cross validation using the stored predictor data (SVMModel.X) and the class labels (SVMModel.Y). The transformation function computes the posterior probability that an observation is classified into the positive class (SVMModel.Classnames(2)). • If the classes are inseparable, then the transformation function is the sigmoid function on page 322194. • If the classes are perfectly separable, the transformation function is the step function on page 322194. • In two-class learning, if one of the two classes has a relative frequency of 0, then the transformation function is the constant function on page 32-2195. fitSVMPosterior is not appropriate for one-class learning. • If SVMModel is a ClassificationSVM classifier, then the software estimates the optimal transformation function by 10-fold cross validation as outlined in [1]. Otherwise, SVMModel must be a ClassificationPartitionedModel classifier. SVMModel specifies the cross-validation method. • The software stores the optimal transformation function in ScoreSVMModel.ScoreTransform. ScoreSVMModel = fitSVMPosterior(SVMModel,TBL,ResponseVarName) returns a trained support vector classifier containing the transformation function from the trained, compact SVM classifier SVMModel. The software estimates the score transformation function using predictor data in the table TBL and class labels TBL.ResponseVarName. ScoreSVMModel = fitSVMPosterior(SVMModel,TBL,Y) returns a trained support vector classifier containing the transformation function from the trained, compact SVM classifier SVMModel. The software estimates the score transformation function using predictor data in the table TBL and class labels Y. 32-2186

fitSVMPosterior

ScoreSVMModel = fitSVMPosterior(SVMModel,X,Y) returns a trained support vector classifier containing the transformation function from the trained, compact SVM classifier SVMModel. The software estimates the score transformation function using predictor data X and class labels Y. ScoreSVMModel = fitSVMPosterior( ___ ,Name,Value) uses additional options specified by one or more Name,Value pair arguments provided SVMModel is a ClassificationSVM classifier. For example, you can specify the number of folds to use in k-fold cross validation. [ScoreSVMModel,ScoreTransform] = fitSVMPosterior( ___ ) additionally returns the transformation function parameters (ScoreTransform) using any of the input arguments in the previous syntaxes.

Examples Fit the Score-to-Posterior Probability Function for Separable Classes Load Fisher's iris data set. Train the classifier using the petal lengths and widths, and remove the virginica species from the data. load fisheriris classKeep = ~strcmp(species,'virginica'); X = meas(classKeep,3:4); y = species(classKeep); gscatter(X(:,1),X(:,2),y); title('Scatter Diagram of Iris Measurements') xlabel('Petal length') ylabel('Petal width') legend('Setosa','Versicolor')

32-2187

32

Functions

The classes are perfectly separable. Therefore, the score transformation function is a step function. Train an SVM classifier using the data. Cross validate the classifier using 10-fold cross validation (the default). rng(1); CVSVMModel = fitcsvm(X,y,'CrossVal','on');

CVSVMModel is a trained ClassificationPartitionedModel SVM classifier. Estimate the step function that transforms scores to posterior probabilities. [ScoreCVSVMModel,ScoreParameters] = fitSVMPosterior(CVSVMModel);

Warning: Classes are perfectly separated. The optimal score-to-posterior transformation is a step

fitSVMPosterior does the following: • Uses the data that the software stored in CVSVMModel to fit the transformation function • Warns whenever the classes are separable • Stores the step function in ScoreCSVMModel.ScoreTransform Display the score function type and its parameter values. ScoreParameters ScoreParameters = struct with fields: Type: 'step'

32-2188

fitSVMPosterior

LowerBound: -0.8431 UpperBound: 0.6897 PositiveClassProbability: 0.5000

ScoreParameters is a structure array with four fields: • The score transformation function type (Type) • The score corresponding to the negative class boundary (LowerBound) • The score corresponding to the positive class boundary (UpperBound) • The positive class probability (PositiveClassProbability) Since the classes are separable, the step function transforms the score to either 0 or 1, which is the posterior probability that an observation is a versicolor iris.

Fit the Score-to-Posterior Probability Function for Inseparable Classes Load the ionosphere data set. load ionosphere

The classes of this data set are not separable. Train an SVM classifier. Cross validate using 10-fold cross validation (the default). It is good practice to standardize the predictors and specify the class order. rng(1) % For reproducibility CVSVMModel = fitcsvm(X,Y,'ClassNames',{'b','g'},'Standardize',true,... 'CrossVal','on'); ScoreTransform = CVSVMModel.ScoreTransform ScoreTransform = 'none'

CVSVMModel is a trained ClassificationPartitionedModel SVM classifier. The positive class is 'g'. The ScoreTransform property is none. Estimate the optimal score function for mapping observation scores to posterior probabilities of an observation being classified as 'g'. [ScoreCVSVMModel,ScoreParameters] = fitSVMPosterior(CVSVMModel); ScoreTransform = ScoreCVSVMModel.ScoreTransform ScoreTransform = '@(S)sigmoid(S,-9.481576e-01,-1.218300e-01)' ScoreParameters ScoreParameters = struct with fields: Type: 'sigmoid' Slope: -0.9482 Intercept: -0.1218

32-2189

32

Functions

ScoreTransform is the optimal score transform function. ScoreParameters contains the score transformation function, slope estimate, and the intercept estimate. You can estimate test-sample, posterior probabilities by passing ScoreCVSVMModel to kfoldPredict.

Estimate Posterior Probabilities for Test Samples Estimate positive class posterior probabilities for the test set of an SVM algorithm. Load the ionosphere data set. load ionosphere

Train an SVM classifier. Specify a 20% holdout sample. It is good practice to standardize the predictors and specify the class order. rng(1) % For reproducibility CVSVMModel = fitcsvm(X,Y,'Holdout',0.2,'Standardize',true,... 'ClassNames',{'b','g'});

CVSVMModel is a trained ClassificationPartitionedModel cross-validated classifier. Estimate the optimal score function for mapping observation scores to posterior probabilities of an observation being classified as 'g'. ScoreCVSVMModel = fitSVMPosterior(CVSVMModel);

ScoreSVMModel is a trained ClassificationPartitionedModel cross-validated classifier containing the optimal score transformation function estimated from the training data. Estimate the out-of-sample positive class posterior probabilities. Display the results for the first 10 out-of-sample observations. [~,OOSPostProbs] = kfoldPredict(ScoreCVSVMModel); indx = ~isnan(OOSPostProbs(:,2)); hoObs = find(indx); % Holdout observation numbers OOSPostProbs = [hoObs, OOSPostProbs(indx,2)]; table(OOSPostProbs(1:10,1),OOSPostProbs(1:10,2),... 'VariableNames',{'ObservationIndex','PosteriorProbability'}) ans=10×2 table ObservationIndex ________________ 6 7 8 9 16 22 23 24 38

32-2190

PosteriorProbability ____________________ 0.17378 0.89637 0.0076609 0.91602 0.026718 4.6081e-06 0.9024 2.4129e-06 0.00042697

fitSVMPosterior

41

0.86427

Input Arguments SVMModel — Trained SVM classifier ClassificationSVM classifier | CompactClassificationSVM classifier | ClassificationPartitionedModel classifier Trained SVM classifier, specified as a ClassificationSVM, CompactClassificationSVM, or ClassificationPartitionedModel classifier. If SVMModel is a ClassificationSVM classifier, then you can set optional name-value pair arguments. If SVMModel is a CompactClassificationSVM classifier, then you must input predictor data X and class labels Y. TBL — Sample data table Sample data, specified as a table. Each row of TBL corresponds to one observation, and each column corresponds to one predictor variable. Optionally, TBL can contain additional columns for the response variable and observation weights. TBL must contain all of the predictors used to train SVMModel. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. If TBL contains the response variable used to train SVMModel, then you do not need to specify ResponseVarName or Y. If you trained SVMModel using sample data contained in a table, then the input data for fitSVMPosterior must also be in a table. If you set 'Standardize',true in fitcsvm when training SVMModel, then the software standardizes the columns of the predictor data using the corresponding means in SVMModel.Mu and the standard deviations in SVMModel.Sigma. Data Types: table X — Predictor data matrix Predictor data used to estimate the score-to-posterior-probability transformation function, specified as a matrix. Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature). The length of Y and the number of rows in X must be equal. If you set 'Standardize',true in fitcsvm when training SVMModel, then the software fits the transformation function parameter estimates using standardized data. Data Types: double | single 32-2191

32

Functions

ResponseVarName — Response variable name name of variable in TBL Response variable name, specified as the name of a variable in TBL. If TBL contains the response variable used to train SVMModel, then you do not need to specify ResponseVarName. If you specify ResponseVarName, then you must do so as a character vector or string scalar. For example, if the response variable is stored as TBL.Response, then specify ResponseVarName as 'Response'. Otherwise, the software treats all columns of TBL, including TBL.Response, as predictors. The response variable must be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array. Data Types: char | string Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels used to estimate the score-to-posterior-probability transformation function, specified as a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. If Y is a character array, then each element must correspond to one class label. The length of Y and the number of rows in X must be equal. Data Types: categorical | char | string | logical | single | double | cell Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'KFold',8 performs 8-fold cross validation when SVMModel is a ClassificationSVM classifier. CVPartition — Cross-validation partition [] (default) | cvpartition partition Cross-validation partition used to compute the transformation function, specified as the commaseparated pair consisting of 'CVPartition' and a cvpartition partition object as created by cvpartition. You can use only one of these four options at a time for creating a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. The crossval name-value pair argument of fitcsvm splits the data into subsets using cvpartition. Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,'KFold',5). Then, you can specify the cross-validated model by using 'CVPartition',cvp. Holdout — Fraction of data for holdout validation scalar value in the range (0,1) 32-2192

fitSVMPosterior

Fraction of the data for holdout validation used to compute the transformation function, specified as the comma-separated pair consisting of 'Holdout' and a scalar value in the range (0,1). Holdout validation tests the specified fraction of the data and uses the remaining data for training. You can use only one of these four options at a time for creating a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Example: 'Holdout',0.1 Data Types: double | single KFold — Number of folds 10 (default) | positive integer value greater than 1 Number of folds to use when computing the transformation function, specified as the commaseparated pair consisting of 'KFold' and a positive integer value greater than 1. You can use only one of these four options at a time for creating a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Example: 'KFold',8 Data Types: single | double Leaveout — Leave-one-out cross-validation flag 'off' (default) | 'on' Leave-one-out cross-validation flag indicating whether to use leave-one-out cross-validation to compute the transformation function, specified as the comma-separated pair consisting of 'Leaveout' and 'on' or 'off'. Use leave-one-out cross-validation by specifying 'Leaveout','on'. You can use only one of these four options at a time for creating a cross-validated model: 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Example: 'Leaveout','on'

Output Arguments ScoreSVMModel — Trained SVM classifier ClassificationSVM classifier | CompactClassificationSVM classifier | ClassificationPartitionedModel classifier Trained SVM classifier containing the estimated score transformation function, returned as a ClassificationSVM, CompactClassificationSVM, or ClassificationPartitionedModel classifier. The ScoreSVMModel classifier type is the same as the SVMModel classifier type. To estimate posterior probabilities, pass ScoreSVMModel and predictor data to predict. If you set 'Standardize',true in fitcsvm to train SVMModel, then predict standardizes the columns of X using the corresponding means in SVMModel.Mu and standard deviations in SVMModel.Sigma. ScoreTransform — Optimal score-to-posterior-probability transformation function parameters structure array 32-2193

32

Functions

Optimal score-to-posterior-probability transformation function parameters, specified as a structure array. If field Type is: • sigmoid, then ScoreTransform has these fields: • Slope — The value of A in the sigmoid function on page 32-2194 • Intercept — The value of B in the sigmoid function • step, then ScoreTransform has these fields: • PositiveClassProbability: the value of π in the step function on page 32-2194. π represents: • The probability that an observation is in the positive class. • The posterior probability that a score is in the interval (LowerBound,UpperBound). • LowerBound: the value

max sn in the step function. It represents the lower bound of the

yn = − 1

interval that assigns the posterior probability of being in the positive class PositiveClassProbability to scores. Any observation with a score less than LowerBound has posterior probability of being the positive class 0. • UpperBound: the value

min sn in the step function. It represents the upper bound of the

yn = + 1

interval that assigns the posterior probability of being in the positive class PositiveClassProbability. Any observation with a score greater than UpperBound has posterior probability of being the positive class 1. • constant, then ScoreTransform.PredictedClass contains the name of the class prediction. This result is the same as SVMModel.ClassNames. The posterior probability of an observation being in ScoreTransform.PredictedClass is always 1.

More About Sigmoid Function The sigmoid function that maps score sj corresponding to observation j to the positive class posterior probability is P(s j) =

1 . 1 + exp(As j + B)

If the value of the Type field of ScoreTransform is sigmoid, then parameters A and B correspond to the fields Scale and Intercept of ScoreTransform, respectively. Step Function The step function that maps score sj corresponding to observation j to the positive class posterior probability is 0; s < max sk yk = − 1

P sj =

π;

max sk ≤ s j ≤

yk = − 1

1; s j >

32-2194

min sk

yk = + 1

min sk ,

yk = + 1

fitSVMPosterior

where: • sj is the score of observation j. • +1 and –1 denote the positive and negative classes, respectively. • π is the prior probability that an observation is in the positive class. If the value of the Type field of ScoreTransform is step, then the quantities max sk and yk = − 1

min sk

yk = + 1

correspond to the fields LowerBound and UpperBound of ScoreTransform, respectively. Constant Function The constant function maps all scores in a sample to posterior probabilities 1 or 0. If all observations have posterior probability 1, then they are expected to come from the positive class. If all observations have posterior probability 0, then they are not expected to come from the positive class.

Tips • This process describes one way to predict positive class posterior probabilities. 1

Train an SVM classifier by passing the data to fitcsvm. The result is a trained SVM classifier, such as SVMModel, that stores the data. The software sets the score transformation function property (SVMModel.ScoreTransformation) to none.

2

Pass the trained SVM classifier SVMModel to fitSVMPosterior or fitPosterior. The result, such as, ScoreSVMModel, is the same trained SVM classifier as SVMModel, except the software sets ScoreSVMModel.ScoreTransformation to the optimal score transformation function.

3

Pass the predictor data matrix and the trained SVM classifier containing the optimal score transformation function (ScoreSVMModel) to predict. The second column in the second output argument of predict stores the positive class posterior probabilities corresponding to each row of the predictor data matrix. If you skip step 2, then predict returns the positive class score rather than the positive class posterior probability.

• After fitting posterior probabilities, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder. For details, see “Introduction to Code Generation” on page 31-2.

Algorithms If you re-estimate the score-to-posterior-probability transformation function, that is, if you pass an SVM classifier to fitPosterior or fitSVMPosterior and its ScoreTransform property is not none, then the software: • Displays a warning • Resets the original transformation function to 'none' before estimating the new one 32-2195

32

Functions

References [1] Platt, J. “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods”. In: Advances in Large Margin Classifiers. Cambridge, MA: The MIT Press, 2000, pp. 61–74.

See Also ClassificationPartitionedModel | ClassificationSVM | CompactClassificationSVM | fitPosterior | fitPosterior | fitcsvm | kfoldPredict | predict Introduced in R2014a

32-2196

fitted

fitted Class: GeneralizedLinearMixedModel Fitted responses from generalized linear mixed-effects model

Syntax mufit = fitted(glme) mufit = fitted(glme,Name,Value)

Description mufit = fitted(glme) returns the fitted conditional response of the generalized linear mixedeffects model glme. mufit = fitted(glme,Name,Value) returns the fitted response with additional options specified by one or more name-value pair arguments. For example, you can specify to compute the marginal fitted response.

Input Arguments glme — Generalized linear mixed-effects model GeneralizedLinearMixedModel object Generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. For properties and methods of this object, see GeneralizedLinearMixedModel. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Conditional — Indicator for conditional response true (default) | false Indicator for conditional response, specified as the comma-separated pair consisting of 'Conditional' and one of the following. Value

Description

true

Contributions from both fixed effects and random effects (conditional)

false

Contribution from only fixed effects (marginal)

To obtain fitted marginal response values, fitted computes the conditional mean of the response with the empirical Bayes predictor vector of random effects b set equal to 0. For more information, see “Conditional and Marginal Response” on page 32-2200 Example: 'Conditional',false 32-2197

32

Functions

Output Arguments mufit — Fitted response values n-by-1 vector Fitted response values, returned as an n-by-1 vector, where n is the number of observations.

Examples Plot Observed Versus Fitted Values Load the sample data. load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data: • Flag to indicate whether the batch used the new process (newprocess) • Processing time for each batch, in hours (time) • Temperature of the batch, in degrees Celsius (temp) • Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier) • Number of defects in the batch (defects) The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius. Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0. The number of defects can be modeled using a Poisson distribution defectsi j ∼ Poisson(μi j) This corresponds to the generalized linear mixed-effects model log(μi j) = β0 + β1newprocessi j + β2time_devi j + β3temp_devi j + β4supplier_Ci j + β5supplier_Bi j + bi, where 32-2198

fitted

• defectsi j is the number of defects observed in the batch produced by factory i during batch j. • μi j is the mean number of defects corresponding to factory i (where i = 1, 2, . . . , 20) during batch j (where j = 1, 2, . . . , 5). • newprocessi j, time_devi j, and temp_devi j are the measurements for each variable that correspond to factory i during batch j. For example, newprocessi j indicates whether the batch produced by factory i during batch j used the new process. • supplier_Ci j and supplier_Bi j are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j. • b ∼ N(0, σ2) is a random-effects intercept for each factory i that accounts for factory-specific i b variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ... 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Generate the fitted conditional mean values for the model. mufit = fitted(glme);

Create a scatterplot of the observed values versus fitted values. figure scatter(mfr.defects,mufit) title('Residuals versus Fitted Values') xlabel('Fitted Values') ylabel('Residuals')

32-2199

32

Functions

More About Conditional and Marginal Response A conditional response includes contributions from both fixed- and random-effects predictors. A marginal response includes contribution from only fixed effects. Suppose the generalized linear mixed-effects model glme has an n-by-p fixed-effects design matrix X and an n-by-q random-effects design matrix Z. Also, suppose the estimated p-by-1 fixed-effects vector is β , and the q-by-1 empirical Bayes predictor vector of random effects is b . The fitted conditional response corresponds to the 'Conditional',true name-value pair argument, and is defined as μ cond = g−1 η ME , where η ME is the linear predictor including the fixed- and random-effects of the generalized linear mixed-effects model η ME = Xβ + Zb + δ . The fitted marginal response corresponds to the 'Conditional',false name-value pair argument, and is defined as 32-2200

fitted

μ mar = g−1 η FE , whereη FE is the linear predictor including only the fixed-effects portion of the generalized linear mixed-effects model η FE = Xβ + δ .

See Also GeneralizedLinearMixedModel | designMatrix | fitglme | residuals | response

32-2201

32

Functions

fitted Class: LinearMixedModel Fitted responses from a linear mixed-effects model

Syntax yfit = fitted(lme) yfit = fitted(lme,Name,Value)

Description yfit = fitted(lme) returns the fitted conditional response on page 32-2207 from the linear mixed-effects model lme. yfit = fitted(lme,Name,Value) returns the fitted response from the linear mixed-effects model lme with additional options specified by one or more Name,Value pair arguments. For example, you can specify if you want to compute the fitted marginal response on page 32-2207.

Input Arguments lme — Linear mixed-effects model LinearMixedModel object Linear mixed-effects model, specified as a LinearMixedModel object constructed using fitlme or fitlmematrix. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Conditional — Indicator for conditional response true (default) | false Indicator for conditional response, specified as the comma-separated pair consisting of 'Conditional' and either of the following. true

Contribution from both fixed effects and random effects (conditional)

false

Contribution from only fixed effects (marginal)

Example: 'Conditional',false Data Types: logical

32-2202

fitted

Output Arguments yfit — Fitted response values n-by-1 vector Fitted response values, returned as an n-by-1 vector, where n is the number of observations.

Examples Compute Fitted Conditional and Marginal Responses Load the sample data. load flu

The flu dataset array has a Date variable, and 10 variables containing estimated influenza rates (in 9 different regions, estimated from Google® searches, plus a nationwide estimate from the Center for Disease Control and Prevention, CDC). To fit a linear-mixed effects model, your data must be in a properly formatted dataset array. To fit a linear mixed-effects model with the influenza rates as the responses and region as the predictor variable, combine the nine columns corresponding to the regions into an array. The new dataset array, flu2, must have the response variable, FluRate, the nominal variable, Region, that shows which region each estimate is from, and the grouping variable Date. flu2 = stack(flu,2:10,'NewDataVarName','FluRate','IndVarName','Region'); flu2.Date = nominal(flu2.Date);

Fit a linear mixed-effects model with fixed effects for region and a random intercept that varies by Date. Region is a categorical variable. You can specify the contrasts for categorical variables using the DummyVarCoding name-value pair argument when fitting the model. When you do not specify the contrasts, fitlme uses the 'reference' contrast by default. Because the model has an intercept, fitlme takes the first region, NE, as the reference and creates eight dummy variables representing the other eight regions. For example, I[MidAtl] is the dummy variable representing the region MidAtl. For details, see “Dummy Variables” on page 2-48. The corresponding model is yim = β0 + β1I MidAtl i + β2I ENCentral i + β3I WNCentral i + β4I SAtl

i

+β5I ESCentral i + β6I WSCentral i + β7I Mtn i + β8I Pac i + b0m + εim,

m = 1, 2, . . . , 52,

where yim is the observation i for level m of grouping variable Date, β j, j = 0, 1, ..., 8, are the fixedeffects coefficients, with β0 being the coefficient for region NE. b0m is the random effect for level m of the grouping variable Date, and εim is the observation error for observation i. The random effect has the prior distribution, b0m ∼ N(0, σb2) and the error term has the distribution, εim ∼ N(0, σ2). lme = fitlme(flu2,'FluRate ~ 1 + Region + (1|Date)') lme = Linear mixed-effects model fit by ML

32-2203

32

Functions

Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters

468 9 52 2

Formula: FluRate ~ 1 + Region + (1 | Date) Model fit statistics: AIC BIC 318.71 364.35

LogLikelihood -148.36

Deviance 296.71

Fixed effects coefficients (95% CIs): Name Estimate {'(Intercept)' } 1.2233 {'Region_MidAtl' } 0.010192 {'Region_ENCentral'} 0.051923 {'Region_WNCentral'} 0.23687 {'Region_SAtl' } 0.075481 {'Region_ESCentral'} 0.33917 {'Region_WSCentral'} 0.069 {'Region_Mtn' } 0.046673 {'Region_Pac' } -0.16013 pValue 1.085e-31 0.84534 0.3206 7.3324e-06 0.14902 2.1623e-10 0.18705 0.37191 0.0022936

Lower 1.0334 -0.092429 -0.050698 0.13424 -0.02714 0.23655 -0.033621 -0.055948 -0.26276

SE 0.096678 0.052221 0.052221 0.052221 0.052221 0.052221 0.052221 0.052221 0.052221

DF 459 459 459 459 459 459 459 459 459

Upper 1.4133 0.11281 0.15454 0.33949 0.1781 0.44179 0.17162 0.14929 -0.057514

Random effects covariance parameters (95% CIs): Group: Date (52 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} Lower 0.5297

tStat 12.654 0.19518 0.9943 4.5359 1.4454 6.495 1.3213 0.89377 -3.0665

Type {'std'}

Estimate 0.6443

Upper 0.78368

Group: Error Name {'Res Std'}

Estimate 0.26627

Lower 0.24878

Upper 0.285

The p-values 7.3324e-06 and 2.1623e-10 respectively show that the fixed effects of the flu rates in regions WNCentral and ESCentral are significantly different relative to the flu rates in region NE.

32-2204

fitted

The confidence limits for the standard deviation of the random-effects term, σb, do not include 0 (0.5297, 0.78368), which indicates that the random-effects term is significant. You can also test the significance of the random-effects terms using the compare method. The conditional fitted response from the model at a given observation includes contributions from fixed and random effects. For example, the estimated best linear unbiased predictor (BLUP) of the flu rate for region WNCentral in week 10/9/2005 is yWNCentral, 10/9/2005 = β0 + β3I WNCentral + b10/9/2005 = 1 . 2233 + 0 . 23687 − 0 . 1718 = 1 . 28837 . This is the fitted conditional response, since it includes contributions to the estimate from both the fixed and random effects. You can compute this value as follows. beta = fixedEffects(lme); [~,~,STATS] = randomEffects(lme); % Compute the random-effects statistics (STATS) STATS.Level = nominal(STATS.Level); y_hat = beta(1) + beta(4) + STATS.Estimate(STATS.Level=='10/9/2005') y_hat = 1.2884

In the previous calculation, beta(1) corresponds to the estimate for β0 and beta(4) corresponds to the estimate for β3. You can simply display the fitted value using the fitted method. F = fitted(lme); F(flu2.Date == '10/9/2005' & flu2.Region == 'WNCentral') ans = 1.2884

The estimated marginal response for region WNCentral in week 10/9/2005 is (marginal)

yWNCentral, 10/9/2005 = β0 + β3I WNCentral = 1 . 2233 + 0 . 23687 = 1 . 46017 . Compute the fitted marginal response. F = fitted(lme,'Conditional',false); F(flu2.Date == '10/9/2005' & flu2.Region == 'WNCentral') ans = 1.4602

Plot Residuals vs. Fitted Values Load the sample data. load('weight.mat');

weight contains data from a longitudinal study, where 20 subjects are randomly assigned to 4 exercise programs, and their weight loss is recorded over six 2-week time periods. This is simulated data. 32-2205

32

Functions

Store the data in a table. Define Subject and Program as categorical variables. tbl = table(InitialWeight,Program,Subject,Week,y); tbl.Subject = nominal(tbl.Subject); tbl.Program = nominal(tbl.Program);

Fit a linear mixed-effects model where the initial weight, type of program, week, and the interaction between the week and type of program are the fixed effects. The intercept and week vary by subject. lme = fitlme(tbl,'y ~ InitialWeight + Program*Week + (Week|Subject)');

Compute the fitted values and raw residuals. F = fitted(lme); R = residuals(lme);

Plot the residuals versus the fitted values. plot(F,R,'bx') xlabel('Fitted Values') ylabel('Residuals')

Now, plot the residuals versus the fitted values, grouped by program. figure() gscatter(F,R,Program)

32-2206

fitted

More About Fitted Conditional and Marginal Response A conditional response includes contributions from both fixed and random effects, whereas a marginal response includes contribution from only fixed effects. Suppose the linear mixed-effects model, lme, has an n-by-p fixed-effects design matrix X and an n-byq random-effects design matrix Z. Also, suppose the p-by-1 estimated fixed-effects vector is β , and the q-by-1 estimated best linear unbiased predictor (BLUP) vector of random effects is b . The fitted conditional response is y Cond = Xβ + Zb , and the fitted marginal response is y Mar = Xβ ,

See Also LinearMixedModel | residuals | response

32-2207

32

Functions

fixedEffects Class: GeneralizedLinearMixedModel Estimates of fixed effects and related statistics

Syntax beta = fixedEffects(glme) [beta,betanames] = fixedEffects(glme) [beta,betanames,stats] = fixedEffects(glme) [ ___ ] = fixedEffects(glme,Name,Value)

Description beta = fixedEffects(glme) returns the estimated fixed-effects coefficients, beta, of the generalized linear mixed-effects model glme. [beta,betanames] = fixedEffects(glme) also returns the names of estimated fixed-effects coefficients in betanames. Each name corresponds to a fixed-effects coefficient in beta. [beta,betanames,stats] = fixedEffects(glme) also returns a table of statistics, stats, related to the estimated fixed-effects coefficients of glme. [ ___ ] = fixedEffects(glme,Name,Value) returns any of the output arguments in previous syntaxes using additional options specified by one or more Name,Value pair arguments. For example, you can specify the confidence level, or the method for computing the approximate degrees of freedom for the t-statistic.

Input Arguments glme — Generalized linear mixed-effects model GeneralizedLinearMixedModel object Generalized linear mixed-effects model, specified as a GeneralizedLinearMixedModel object. For properties and methods of this object, see GeneralizedLinearMixedModel. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Alpha — Significance level 0.05 (default) | scalar value in the range [0,1] Significance level, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range [0,1]. For a value α, the confidence level is 100 × (1 – α)%. For example, for 99% confidence intervals, you can specify the confidence level as follows. Example: 'Alpha',0.01 32-2208

fixedEffects

Data Types: single | double DFMethod — Method for computing approximate degrees of freedom 'residual' (default) | 'none' Method for computing approximate degrees of freedom, specified as the comma-separated pair consisting of 'DFMethod' and one of the following. Value

Description

'residual'

The degrees of freedom value is assumed to be constant and equal to n – p, where n is the number of observations and p is the number of fixed effects.

'none'

The degrees of freedom is set to infinity.

Example: 'DFMethod','none'

Output Arguments beta — Estimated fixed-effects coefficients vector Estimated fixed-effects coefficients of the fitted generalized linear mixed-effects model glme, returned as a vector. betanames — Names of fixed-effects coefficients table Names of fixed-effects coefficients in beta, returned as a table. stats — Fixed-effects estimates and related statistics dataset array Fixed-effects estimates and related statistics, returned as a dataset array that has one row for each of the fixed effects and one column for each of the following statistics. Column Name

Description

Name

Name of the fixed-effects coefficient

Estimate

Estimated coefficient value

SE

Standard error of the estimate

tStat

t-statistic for a test that the coefficient is 0

DF

Estimated degrees of freedom for the t-statistic

pValue

p-value for the t-statistic

Lower

Lower limit of a 95% confidence interval for the fixed-effects coefficient

Upper

Upper limit of a 95% confidence interval for the fixed-effects coefficient

When fitting a model using fitglme and one of the maximum likelihood fit methods ('Laplace' or 'ApproximateLaplace'), if you specify the 'CovarianceMethod' name-value pair argument as 32-2209

32

Functions

'conditional', then SE does not account for the uncertainty in estimating the covariance parameters. To account for this uncertainty, specify 'CovarianceMethod' as 'JointHessian'. When fitting a GLME model using fitglme and one of the pseudo likelihood fit methods ('MPL' or 'REMPL'), fixedEffects bases the fixed effects estimates and related statistics on the fitted linear mixed-effects model from the final pseudo likelihood iteration.

Examples Estimate Fixed-Effects Coefficients Load the sample data. load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data: • Flag to indicate whether the batch used the new process (newprocess) • Processing time for each batch, in hours (time) • Temperature of the batch, in degrees Celsius (temp) • Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier) • Number of defects in the batch (defects) The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius. Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0. The number of defects can be modeled using a Poisson distribution defectsi j ∼ Poisson(μi j) This corresponds to the generalized linear mixed-effects model log(μi j) = β0 + β1newprocessi j + β2time_devi j + β3temp_devi j + β4supplier_Ci j + β5supplier_Bi j + bi, where 32-2210

fixedEffects

• defectsi j is the number of defects observed in the batch produced by factory i during batch j. • μi j is the mean number of defects corresponding to factory i (where i = 1, 2, . . . , 20) during batch j (where j = 1, 2, . . . , 5). • newprocessi j, time_devi j, and temp_devi j are the measurements for each variable that correspond to factory i during batch j. For example, newprocessi j indicates whether the batch produced by factory i during batch j used the new process. • supplier_Ci j and supplier_Bi j are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j. • b ∼ N(0, σ2) is a random-effects intercept for each factory i that accounts for factory-specific i b variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ... 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Compute and display the estimated fixed-effects coefficient values and related statistics. [beta,betanames,stats] = fixedEffects(glme); stats stats = Fixed effect coefficients: DFMethod = 'residual', Alpha = 0.05 Name {'(Intercept)'} {'newprocess' } {'time_dev' } {'temp_dev' } {'supplier_C' } {'supplier_B' } Lower 1.1515 -0.72019 -1.7395 -2.1926 -0.22679 -0.082588

Estimate 1.4689 -0.36766 -0.094521 -0.28317 -0.071868 0.071072

SE 0.15988 0.17755 0.82849 0.9617 0.078024 0.07739

tStat 9.1875 -2.0708 -0.11409 -0.29444 -0.9211 0.91836

DF 94 94 94 94 94 94

pValue 9.8194e-15 0.041122 0.90941 0.76907 0.35936 0.36078

Upper 1.7864 -0.015134 1.5505 1.6263 0.083051 0.22473

The returned results indicate, for example, that the estimated coefficient for temp_dev is –0.28317. Its large p-value, 0.76907, indicates that it is not a statistically significant predictor at the 5% significance level. Additionally, the confidence interval boundaries Lower and Upper indicate that the 95% confidence interval for the coefficient for temp_dev is [-2.1926 , 1.6263]. This interval contains 0, which supports the conclusion that temp_dev is not statistically significant at the 5% significance level.

See Also GeneralizedLinearMixedModel | coefCI | coefTest | fitglme | randomEffects

32-2211

32

Functions

fixedEffects Class: LinearMixedModel Estimates of fixed effects and related statistics

Syntax beta = fixedEffects(lme) [beta,betanames] = fixedEffects(lme) [beta,betanames,stats] = fixedEffects(lme) [beta,betanames,stats] = fixedEffects(lme,Name,Value)

Description beta = fixedEffects(lme) returns the estimated fixed-effects coefficients, beta, of the linear mixed-effects model lme. [beta,betanames] = fixedEffects(lme) also returns the names of estimated fixed-effects coefficients in betanames. Each name corresponds to a fixed-effects coefficient in beta. [beta,betanames,stats] = fixedEffects(lme) also returns the estimated fixed-effects coefficients of the linear mixed-effects model lme and related statistics in stats. [beta,betanames,stats] = fixedEffects(lme,Name,Value) also returns the estimated fixed-effects coefficients of the linear mixed-effects model lme and related statistics with additional options specified by one or more Name,Value pair arguments.

Input Arguments lme — Linear mixed-effects model LinearMixedModel object Linear mixed-effects model, specified as a LinearMixedModel object constructed using fitlme or fitlmematrix. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Alpha — Significance level 0.05 (default) | scalar value in the range 0 to 1 Significance level, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range 0 to 1. For a value α, the confidence level is 100*(1–α)%. For example, for 99% confidence intervals, you can specify the confidence level as follows. Example: 'Alpha',0.01 32-2212

fixedEffects

Data Types: single | double DFMethod — Method for computing approximate degrees of freedom 'residual' (default) | 'satterthwaite' | 'none' Method for computing approximate degrees of freedom for the t-statistic that tests the fixed-effects coefficients against 0, specified as the comma-separated pair consisting of 'DFMethod' and one of the following. 'residual'

Default. The degrees of freedom are assumed to be constant and equal to n – p, where n is the number of observations and p is the number of fixed effects.

'satterthwaite'

Satterthwaite approximation.

'none'

All degrees of freedom are set to infinity.

For example, you can specify the Satterthwaite approximation as follows. Example: 'DFMethod','satterthwaite'

Output Arguments beta — Fixed-effects coefficients estimates vector Fixed-effects coefficients estimates of the fitted linear mixed-effects model lme, returned as a vector. betanames — Names of fixed-effects coefficients table Names of fixed-effects coefficients in beta, returned as a table. stats — Fixed-effects estimates and related statistics dataset array Fixed-effects estimates and related statistics, returned as a dataset array that has one row for each of the fixed effects and one column for each of the following statistics. Name

Name of the fixed effect coefficient

Estimate

Estimated coefficient value

SE

Standard error of the estimate

tStat

t-statistic for a test that the coefficient is zero

DF

Estimated degrees of freedom for the t-statistic

pValue

p-value for the t-statistic

Lower

Lower limit of a 95% confidence interval for the fixed-effect coefficient

Upper

Upper limit of a 95% confidence interval for the fixed-effect coefficient

32-2213

32

Functions

Examples Display Fixed-Effects Coefficient Estimates and Names Load the sample data. load('weight.mat');

The data set weight contains data from a longitudinal study, where 20 subjects are randomly assigned to 4 exercise programs, and their weight loss is recorded over six 2-week time periods. This is simulated data. Store the data in a table. Define Subject and Program as categorical variables. tbl = table(InitialWeight,Program,Subject,Week,y); tbl.Subject = nominal(tbl.Subject); tbl.Program = nominal(tbl.Program);

Fit a linear mixed-effects model where the initial weight, type of program, week, and the interaction between week and program are the fixed effects. The intercept and week vary by subject. lme = fitlme(tbl,'y ~ InitialWeight + Program*Week + (Week|Subject)');

Display the fixed-effects coefficient estimates and corresponding fixed-effects names. [beta,betanames] = fixedEffects(lme) beta = 9×1 0.6610 0.0032 0.3608 -0.0333 0.1132 0.1732 0.0388 0.0305 0.0331 betanames=9×1 table Name __________________ {'(Intercept)' } {'InitialWeight' } {'Program_B' } {'Program_C' } {'Program_D' } {'Week' } {'Program_B:Week'} {'Program_C:Week'} {'Program_D:Week'}

32-2214

fixedEffects

Compute Coefficient Estimates and Related Statistics Load the sample data. load carbig

Fit a linear mixed-effects model for miles per gallon (MPG), with fixed effects for acceleration and horsepower, and potentially correlated random effects for intercept and acceleration grouped by model year. First, store the data in a table. tbl = table(Acceleration,Horsepower,Model_Year,MPG);

Fit the model. lme = fitlme(tbl, 'MPG ~ Acceleration + Horsepower + (Acceleration|Model_Year)');

Compute the fixed-effects coefficients estimates and related statistics. [~,~,stats] = fixedEffects(lme) stats = Fixed effect coefficients: DFMethod = 'Residual', Alpha = 0.05 Name {'(Intercept)' } {'Acceleration'} {'Horsepower' } pValue 7.7727e-71 1.7075e-05 5.188e-76

Lower 45.679 -0.84661 -0.18382

Estimate 50.133 -0.58327 -0.16954

SE 2.2652 0.13394 0.0072609

tStat 22.132 -4.3545 -23.35

DF 389 389 389

Upper 54.586 -0.31992 -0.15527

The small p-values (under pValue) indicate that all fixed-effects coefficients are significant.

Compute Confidence Intervals with Specified Options Load the sample data. load('shift.mat');

The data shows the deviations from the target quality characteristic measured from the products that five operators manufacture during three shifts: morning, evening, and night. This is a randomized block design, where the operators are the blocks. The experiment is designed to study the impact of the time of shift on the performance. The performance measure is the deviation of the quality characteristics from the target value. This is simulated data. Shift and Operator are nominal variables. shift.Shift = nominal(shift.Shift); shift.Operator = nominal(shift.Operator);

Fit a linear mixed-effects model with a random intercept grouped by operator to assess if performance significantly differs according to the time of the shift. 32-2215

32

Functions

lme = fitlme(shift,'QCDev ~ Shift + (1|Operator)');

Compute the 99% confidence intervals for fixed-effects coefficients, using the residual method to compute the degrees of freedom. This is the default method. [~,~,stats] = fixedEffects(lme,'alpha',0.01) stats = Fixed effect coefficients: DFMethod = 'Residual', Alpha = 0.01 Name {'(Intercept)' } {'Shift_Morning'} {'Shift_Night' } Lower 0.41081 -1.8635 0.5089

Estimate 3.1196 -0.3868 1.9856

SE 0.88681 0.48344 0.48344

tStat 3.5178 -0.80009 4.1072

DF 12 12 12

pValue 0.0042407 0.43921 0.0014535

Upper 5.8284 1.0899 3.4623

Compute the 99% confidence intervals for fixed-effects coefficients, using the Satterthwaite approximation to compute the degrees of freedom. [~,~,stats] = fixedEffects(lme,'DFMethod','satterthwaite','alpha',0.01) stats = Fixed effect coefficients: DFMethod = 'Satterthwaite', Alpha = 0.01 Name {'(Intercept)' } {'Shift_Morning'} {'Shift_Night' } Lower -0.14122 -1.919 0.45343

Estimate 3.1196 -0.3868 1.9856

SE 0.88681 0.48344 0.48344

tStat 3.5178 -0.80009 4.1072

DF 6.123 10 10

pValue 0.01214 0.44225 0.00212

Upper 6.3804 1.1454 3.5178

The Satterthwaite approximation usually produces smaller DF values than the residual method. That is why it produces larger p-values (pValue) and larger confidence intervals (see Lower and Upper).

See Also LinearMixedModel | coefCI | coefTest | fitlme | randomEffects

32-2216

fpdf

fpdf F probability density function

Syntax Y = fpdf(X,V1,V2)

Description Y = fpdf(X,V1,V2) computes the F pdf at each of the values in X using the corresponding numerator degrees of freedom V1 and denominator degrees of freedom V2. X, V1, and V2 can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other inputs. V1 and V2 parameters must contain real positive values, and the values in X must lie on the interval [0 Inf]. The probability density function for the F distribution is Γ y = f (x ν1, ν2) =

Γ

(ν1 + ν2) 2 ν1 ν2 Γ 2 2

ν1 ν2

ν1 2

ν1 − 2 2 ν1 + ν2 ν1 2 x ν2

x 1+

Examples y = fpdf(1:6,2,2) y = 0.2500 0.1111 0.0625

0.0400

0.0278

0.0204

z = fpdf(3,5:10,5:10) z = 0.0689 0.0659 0.0620

0.0577

0.0532

0.0487

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

See Also fcdf | finv | frnd | fstat | pdf Topics “F Distribution” on page B-45 Introduced before R2006a 32-2217

32

Functions

fracfact Fractional factorial design

Syntax X = fracfact(gen) [X,conf] = fracfact(gen) [X,conf] = fracfact(gen,Name,Value)

Description X = fracfact(gen) creates the two-level fractional factorial design defined by the generator gen. [X,conf] = fracfact(gen) returns a cell array of character vectors containing the confounding pattern for the design. [X,conf] = fracfact(gen,Name,Value) creates a fractional factorial designs with additional options specified by one or more Name,Value pair arguments.

Input Arguments gen Either a string array or cell array of character vectors where each element contains one “word,” or a character array or string scalar consisting of “words” separated by spaces. “Words” consist of casesensitive letters or groups of letters, where 'a' represents value 1, 'b' represents value 2, ..., 'A' represents value 27, ..., 'Z' represents value 52. Each word defines how the corresponding factor’s levels are defined as products of generators from a 2^K full-factorial design. K is the number of letters of the alphabet in gen. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. FactorNames String array or cell array specifying the name for each factor. Default: {'X1','X2',...} MaxInt Positive integer setting the maximum level of interaction to include in the confounding output. Default: 2 32-2218

fracfact

Output Arguments X The two-level fractional factorial design. X is a matrix of size N-by-P, where • N = 2^K, where K is the number of letters of the alphabet in gen. • P is the number of words in gen. Because X is a two-level design, the components of X are ±1. For the meaning of X, see “Fractional Factorial Designs” on page 27-5. conf Cell array of character vectors containing the confounding pattern for the design.

Examples Generate a fractional factorial design for four variables, where the fourth variable is the product of the first three: x = fracfact('a b c abc') x = -1 -1 -1 -1 1 1 1 1

-1 -1 1 1 -1 -1 1 1

-1 1 -1 1 -1 1 -1 1

-1 1 1 -1 1 -1 -1 1

Find generators for a six-factor design that uses four factors and achieves resolution IV using fracfactgen. Use the result to specify the design: generators = fracfactgen('a b c d e f',4, ... % 4 factors 4) % resolution 4 generators = 'a' 'b' 'c' 'd' 'bcd' 'acd' x = fracfact(generators) x = -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 1 1

-1 -1 1 1 -1 -1

-1 1 -1 1 -1 1

-1 1 1 -1 1 -1

-1 1 1 -1 -1 1

32-2219

32

Functions

-1 -1 1 1 1 1 1 1 1 1

1 1 -1 -1 -1 -1 1 1 1 1

1 1 -1 -1 1 1 -1 -1 1 1

-1 1 -1 1 -1 1 -1 1 -1 1

-1 1 -1 1 1 -1 1 -1 -1 1

1 -1 1 -1 -1 1 1 -1 -1 1

References [1] Box, G. E. P., W. G. Hunter, and J. S. Hunter. Statistics for Experimenters. Hoboken, NJ: WileyInterscience, 1978.

See Also ff2n | fracfactgen | fullfact | hadamard Topics “Fractional Factorial Designs” on page 27-5 Introduced before R2006a

32-2220

fracfactgen

fracfactgen Fractional factorial design generators

Syntax generators generators generators generators

= = = =

fracfactgen(terms) fracfactgen(terms,k) fracfactgen(terms,k,R) fracfactgen(terms,k,R,basic)

Description generators = fracfactgen(terms) uses the Franklin-Bailey algorithm to find generators for the smallest two-level fractional-factorial design for estimating linear model terms specified by terms. terms is a character vector or string scalar consisting of words formed from the 52 case-sensitive letters a-Z, separated by spaces. Use 'a'-'z' for the first 26 factors, and, if necessary, 'A'-'Z' for the remaining factors. For example, terms = 'a b c ab ac'. Single-letter words indicate main effects to be estimated; multiple-letter words indicate interactions. Alternatively, terms is an m-by-n matrix of 0s and 1s where m is the number of model terms to be estimated and n is the number of factors. For example, if terms contains rows [0 1 0 0] and [1 0 0 1], then the factor b and the interaction between factors a and d are included in the model. generators is a cell array of character vectors with one generator per cell. Pass generators to fracfact to produce the fractional-factorial design and corresponding confounding pattern. generators = fracfactgen(terms,k) returns generators for a two-level fractional-factorial design with 2k-runs, if possible. If k is [], fracfactgen finds the smallest design. generators = fracfactgen(terms,k,R) finds a design with resolution R, if possible. The default resolution is 3. A design of resolution R is one in which no n-factor interaction is confounded with any other effect containing less than R – n factors. Thus a resolution III design does not confound main effects with one another but may confound them with two-way interactions, while a resolution IV design does not confound either main effects or two-way interactions but may confound two-way interactions with each other. If fracfactgen is unable to find a design at the requested resolution, it tries to find a lowerresolution design sufficient to calibrate the model. If it is successful, it returns the generators for the lower-resolution design along with a warning. If it fails, it returns an error. generators = fracfactgen(terms,k,R,basic) also accepts a vector basic specifying the indices of factors that are to be treated as basic. These factors receive full-factorial treatments in the design. The default includes factors that are part of the highest-order interaction in terms.

Examples Suppose you wish to determine the effects of four two-level factors, for which there may be two-way interactions. A full-factorial design would require 24 = 16 runs. The fracfactgen function finds generators for a resolution IV (separating main effects) fractional-factorial design that requires only 23 = 8 runs: 32-2221

32

Functions

generators = fracfactgen('a b c d',3,4) generators = 'a' 'b' 'c' 'abc'

The more economical design and the corresponding confounding pattern are returned by fracfact: [dfF,confounding] = fracfact(generators) dfF = -1 -1 -1 -1 -1 -1 1 1 -1 1 -1 1 -1 1 1 -1 1 -1 -1 1 1 -1 1 -1 1 1 -1 -1 1 1 1 1 confounding = 'Term' 'Generator' 'Confounding' 'X1' 'a' 'X1' 'X2' 'b' 'X2' 'X3' 'c' 'X3' 'X4' 'abc' 'X4' 'X1*X2' 'ab' 'X1*X2 + X3*X4' 'X1*X3' 'ac' 'X1*X3 + X2*X4' 'X1*X4' 'bc' 'X1*X4 + X2*X3' 'X2*X3' 'bc' 'X1*X4 + X2*X3' 'X2*X4' 'ac' 'X1*X3 + X2*X4' 'X3*X4' 'ab' 'X1*X2 + X3*X4'

The confounding pattern shows, for example, that the two-way interaction between X1 and X2 is confounded by the two-way interaction between X3 and X4.

References [1] Box, G. E. P., W. G. Hunter, and J. S. Hunter. Statistics for Experimenters. Hoboken, NJ: WileyInterscience, 1978.

See Also fracfact | hadamard Topics “Fractional Factorial Designs” on page 27-5 Introduced in R2006a

32-2222

friedman

friedman Friedman’s test

Syntax p = friedman(x,reps) p = friedman(x,reps,displayopt) [p,tbl] = friedman( ___ ) [p,tbl,stats] = friedman( ___ )

Description p = friedman(x,reps) returns the p-value for the nonparametric Friedman's test to compare column effects in a two-way layout. friedman tests the null hypothesis that the column effects are all the same against the alternative that they are not all the same. p = friedman(x,reps,displayopt) enables the ANOVA table display when displayopt is 'on' (default) and suppresses the display when displayopt is 'off'. [p,tbl] = friedman( ___ ) returns the ANOVA table (including column and row labels) in cell array tbl. [p,tbl,stats] = friedman( ___ ) also returns a structure stats that you can use to perform a follow-up multiple comparison test.

Examples Test For Column Effects Using Friedman's Test This example shows how to test for column effects in a two-way layout using Friedman's test. Load the sample data. load popcorn popcorn popcorn = 6×3 5.5000 5.5000 6.0000 6.5000 7.0000 7.0000

4.5000 4.5000 4.0000 5.0000 5.5000 5.0000

3.5000 4.0000 3.0000 4.0000 5.0000 4.5000

This data comes from a study of popcorn brands and popper type (Hogg 1987). The columns of the matrix popcorn are brands (Gourmet, National, and Generic). The rows are popper type (Oil and Air). The study popped a batch of each brand three times with each popper. The values are the yield in cups of popped popcorn. 32-2223

32

Functions

Use Friedman's test to determine whether the popcorn brand affects the yield of popcorn. p = friedman(popcorn,3)

p = 0.0010

The small value of p = 0.001 indicates the popcorn brand affects the yield of popcorn.

Input Arguments x — Sample data matrix Sample data for the hypothesis test, specified as a matrix. The columns of x represent changes in a factor A. The rows represent changes in a blocking factor B. If there is more than one observation for each combination of factors, input reps indicates the number of replicates in each “cell,” which must be constant. Data Types: single | double reps — Number of replicates per cell 1 (default) | positive integer value Number of replicates per cell, specified as a positive integer value. Data Types: single | double displayopt — ANOVA table display option 'off' (default) | 'on' ANOVA table display option, specified as 'off' or 'on'. If displayopt is 'on', then friedman displays a figure showing an ANOVA table, which divides the variability of the ranks into two or three parts: • The variability due to the differences among the column effects • The variability due to the interaction between rows and columns (if reps is greater than its default value of 1) • The remaining variability not explained by any systematic source The ANOVA table has six columns: 32-2224

friedman

• The first shows the source of the variability. • The second shows the Sum of Squares (SS) due to each source. • The third shows the degrees of freedom (df) associated with each source. • The fourth shows the Mean Squares (MS), which is the ratio SS/df. • The fifth shows Friedman's chi-square statistic. • The sixth shows the p value for the chi-square statistic. You can copy a text version of the ANOVA table to the clipboard by selecting Copy Text from the Edit menu.

Output Arguments p — p-value scalar value in the range [0,1] p-value of the test, returned as a scalar value in the range [0,1]. p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of p cast doubt on the validity of the null hypothesis. tbl — ANOVA table cell array ANOVA table, including column and row labels, returned as a cell array. The ANOVA table has six columns: • The first shows the source of the variability. • The second shows the Sum of Squares (SS) due to each source. • The third shows the degrees of freedom (df) associated with each source. • The fourth shows the Mean Squares (MS), which is the ratio SS/df. • The fifth shows Friedman's chi-square statistic. • The sixth shows the p value for the chi-square statistic. You can copy a text version of the ANOVA table to the clipboard by selecting Copy Text from the Edit menu. stats — Test data structure Test data, returned as a structure. friedman evaluates the hypothesis that the column effects are all the same against the alternative that they are not all the same. However, sometimes it is preferable to perform a test to determine which pairs of column effects are significantly different, and which are not. You can use the multcompare function to perform such tests by supplying stats as the input value.

More About Friedman’s Test Friedman's test is similar to classical balanced two-way ANOVA, but it tests only for column effects after adjusting for possible row effects. It does not test for row effects or interaction effects. 32-2225

32

Functions

Friedman's test is appropriate when columns represent treatments that are under study, and rows represent nuisance effects (blocks) that need to be taken into account but are not of any interest. The different columns of X represent changes in a factor A. The different rows represent changes in a blocking factor B. If there is more than one observation for each combination of factors, input reps indicates the number of replicates in each “cell,” which must be constant. The matrix below illustrates the format for a set-up where column factor A has three levels, row factor B has two levels, and there are two replicates (reps=2). The subscripts indicate row, column, and replicate, respectively. x111 x121 x131 x112 x122 x132 x211 x221 x231 x212 x222 x232 Friedman's test assumes a model of the form xi jk = μ + αi + β j + εi jk where μ is an overall location parameter, αi represents the column effect, β j represents the row effect, and εi jk represents the error. This test ranks the data within each level of B, and tests for a difference across levels of A. The p that friedman returns is the p value for the null hypothesis that αi = 0. If the p value is near zero, this casts doubt on the null hypothesis. A sufficiently small p value suggests that at least one column-sample median is significantly different than the others; i.e., there is a main effect due to factor A. The choice of a critical p value to determine whether a result is “statistically significant” is left to the researcher. It is common to declare a result significant if the p value is less than 0.05 or 0.01. Friedman's test makes the following assumptions about the data in X: • All data come from populations having the same continuous distribution, apart from possibly different locations due to column and row effects. • All observations are mutually independent. The classical two-way ANOVA replaces the first assumption with the stronger assumption that data come from normal distributions.

References [1] Hogg, R. V., and J. Ledolter. Engineering Statistics. New York: MacMillan, 1987. [2] Hollander, M., and D. A. Wolfe. Nonparametric Statistical Methods. Hoboken, NJ: John Wiley & Sons, Inc., 1999.

See Also anova2 | kruskalwallis | multcompare Introduced before R2006a 32-2226

frnd

frnd F random numbers

Syntax R = frnd(V1,V2) R = frnd(V1,V2,m,n,...) R = frnd(V1,V2,[m,n,...])

Description R = frnd(V1,V2) generates random numbers from the F distribution with numerator degrees of freedom V1 and denominator degrees of freedom V2. V1 and V2 can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input for V1 or V2 is expanded to a constant array with the same dimensions as the other input. V1 and V2 parameters must contain real positive values. R = frnd(V1,V2,m,n,...) or R = frnd(V1,V2,[m,n,...]) generates an m-by-n-by-... array containing random numbers from the F distribution with parameters V1 and V2. V1 and V2 can each be scalars or arrays of the same size as R.

Examples n1 = frnd(1:6,1:6) n1 = 0.0022 0.3121 3.0528

0.3189

0.2715

0.9539

n2 = frnd(2,2,[2 3]) n2 = 0.3186 0.9727 3.0268 0.2052 148.5816 0.2191 n3 = frnd([1 2 3;4 5 6],1,2,3) n3 = 0.6233 0.2322 31.5458 2.5848 0.2121 4.4955

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The generated code can return a different sequence of numbers than MATLAB if either of the following is true: • The output is nonscalar. • An input parameter is invalid for the distribution. 32-2227

32

Functions

For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also fcdf | finv | fpdf | fstat | random Topics “F Distribution” on page B-45 Introduced before R2006a

32-2228

fscchi2

fscchi2 Univariate feature ranking for classification using chi-square tests

Syntax idx = fscchi2(Tbl,ResponseVarName) idx = fscchi2(Tbl,formula) idx = fscchi2(Tbl,Y) idx = fscchi2(X,Y) idx = fscchi2( ___ ,Name,Value) [idx,scores] = fscchi2( ___ )

Description idx = fscchi2(Tbl,ResponseVarName) ranks features (predictors) using chi-square tests on page 32-2238. The table Tbl contains predictor variables and a response variable, and ResponseVarName is the name of the response variable in Tbl. The function returns idx, which contains the indices of predictors ordered by predictor importance, meaning idx(1) is the index of the most important predictor. You can use idx to select important predictors for classification problems. idx = fscchi2(Tbl,formula) specifies a response variable and predictor variables to consider among the variables in Tbl by using formula. idx = fscchi2(Tbl,Y) ranks predictors in Tbl using the response variable Y. idx = fscchi2(X,Y) ranks predictors in X using the response variable Y. idx = fscchi2( ___ ,Name,Value) specifies additional options using one or more name-value pair arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can specify prior probabilities and observation weights. [idx,scores] = fscchi2( ___ ) also returns the predictor scores scores. A large score value indicates that the corresponding predictor is important.

Examples Rank Predictors in Matrix Rank predictors in a numeric matrix and create a bar plot of predictor importance scores. Load the sample data. load ionosphere

ionosphere contains predictor variables (X) and a response variable (Y). Rank the predictors using chi-square tests. 32-2229

32

Functions

[idx,scores] = fscchi2(X,Y);

The values in scores are the negative logs of the p-values. If a p-value is smaller than eps(0), then the corresponding score value is Inf. Before creating a bar plot, determine whether scores includes Inf values. find(isinf(scores)) ans = 1x0 empty double row vector

scores does not include Inf values. If scores includes Inf values, you can replace Inf by a large numeric number before creating a bar plot for visualization purposes. For details, see “Rank Predictors in Table” on page 32-2231. Create a bar plot of the predictor importance scores. bar(scores(idx)) xlabel('Predictor rank') ylabel('Predictor importance score')

Select the top five most important predictors. Find the columns of these predictors in X. idx(1:5) ans = 1×5

32-2230

fscchi2

5

7

3

8

6

The fifth column of X is the most important predictor of Y.

Rank Predictors in Table Rank predictors in a table and create a bar plot of predictor importance scores. If your data is in a table and fscchi2 ranks a subset of the variables in the table, then the function indexes the variables using only the subset. Therefore, a good practice is to move the predictors that you do not want to rank to the end of the table. Move the response variable and observation weight vector as well. Then, the indexes of the output arguments are consistent with the indexes of the table. Load the census1994 data set. load census1994

The table adultdata in census1994 contains demographic data from the US Census Bureau to predict whether an individual makes over $50,000 per year. Display the first three rows of the table. head(adultdata,3) ans=3×15 table age workClass ___ ________________ 39 50 38

State-gov Self-emp-not-inc Private

fnlwgt __________

education _________

77516 83311 2.1565e+05

Bachelors Bachelors HS-grad

education_num _____________ 13 13 9

marital_status __________________ Never-married Married-civ-spouse Divorced

In the table adultdata, the third column fnlwgt is the weight of the samples, and the last column salary is the response variable. Move fnlwgt to the left of salary by using the movevars function. adultdata = movevars(adultdata,'fnlwgt','before','salary'); head(adultdata,3) ans=3×15 table age workClass ___ ________________ 39 50 38

State-gov Self-emp-not-inc Private

education _________ Bachelors Bachelors HS-grad

education_num _____________ 13 13 9

marital_status __________________

occupation ______________

Never-married Married-civ-spouse Divorced

Adm-clerical Exec-manageria Handlers-clean

Rank the predictors in adultdata. Specify the column salary as a response variable, and specify the column fnlwgt as observation weights. [idx,scores] = fscchi2(adultdata,'salary','Weights','fnlwgt');

The values in scores are the negative logs of the p-values. If a p-value is smaller than eps(0), then the corresponding score value is Inf. Before creating a bar plot, determine whether scores includes Inf values. 32-2231

32

Functions

idxInf = find(isinf(scores)) idxInf = 1×8 1

3

4

5

6

7

10

12

scores includes eight Inf values. Create a bar plot of predictor importance scores. Use the predictor names for the x-axis tick labels. figure bar(scores(idx)) xlabel('Predictor rank') ylabel('Predictor importance score') xticklabels(strrep(adultdata.Properties.VariableNames(idx),'_','\_')) xtickangle(45)

The bar function does not plot any bars for the Inf values. For the Inf values, plot bars that have the same length as the largest finite score. hold on bar(scores(idx(length(idxInf)+1))*ones(length(idxInf),1)) legend('Finite Scores','Inf Scores') hold off

The bar graph displays finite scores and Inf scores using different colors.

32-2232

fscchi2

Input Arguments Tbl — Sample data table Sample data, specified as a table. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain additional columns for a response variable and observation weights. A response variable can be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element of the response variable must correspond to one row of the array. • If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName. If Tbl also contains the observation weights, then you can specify the weights by using Weights. • If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify the subset of variables by using formula. • If Tbl does not contain the response variable, then specify a response variable by using Y. The response variable and Tbl must have the same number of rows. If fscchi2 uses a subset of variables in Tbl as predictors, then the function indexes the predictors using only the subset. The values in the 'CategoricalPredictors' name-value pair argument and the output argument idx do not count the predictors that the function does not rank. fscchi2 considers NaN, '' (empty character vector), "" (empty string), , and values in Tbl for a response variable to be missing values. fscchi2 does not use observations with missing values for a response variable. Data Types: table ResponseVarName — Response variable name character vector or string scalar containing name of variable in Tbl Response variable name, specified as a character vector or string scalar containing the name of a variable in Tbl. For example, if a response variable is the column Y of Tbl (Tbl.Y), then specify ResponseVarName as 'Y'. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form 'Y ~ X1 + X2 + X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables. To specify a subset of variables in Tbl as predictors, use a formula. If you specify a formula, then fscchi2 does not rank any variables in Tbl that do not appear in formula. 32-2233

32

Functions

The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. For details, see “Tips” on page 32-2237. Data Types: char | string Y — Response variable numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors Response variable, specified as a numeric, categorical, or logical vector, a character or string array, or a cell array of character vectors. Each row of Y represents the labels of the corresponding row of X. fscchi2 considers NaN, '' (empty character vector), "" (empty string), , and values in Y to be missing values. fscchi2 does not use observations with missing values for Y. Data Types: single | double | categorical | logical | char | string | cell X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one predictor variable. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'NumBins',20,'UseMissing',true sets the number of bins as 20 and specifies to use missing values in predictors for ranking. CategoricalPredictors — List of categorical predictors vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' List of categorical predictors, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the values in this table.

32-2234

Value

Description

Vector of positive integers

Each entry in the vector is an index value corresponding to the column of the predictor data (X or Tbl) that contains a categorical variable.

Logical vector

A true entry means that the corresponding column of predictor data (X or Tbl) is a categorical variable.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the names in Tbl. Pad the names with extra blanks so each row of the character matrix has the same length.

fscchi2

Value

Description

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the names in Tbl.

'all'

All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fscchi2 assumes that a variable is categorical if it is a logical vector, unordered categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fscchi2 assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' name-value pair argument. If fscchi2 uses a subset of variables in Tbl as predictors, then the function indexes the predictors using only the subset. The 'CategoricalPredictors' values do not count the predictors that the function does not rank. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell ClassNames — Names of classes to use for ranking categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of the classes to use for ranking, specified as the comma-separated pair consisting of 'ClassNames' and a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. ClassNames must have the same data type as Y or the response variable in Tbl. If ClassNames is a character array, then each element must correspond to one row of the array. Use 'ClassNames' to: • Specify the order of the Prior dimensions that corresponds to the class order. • Select a subset of classes for ranking. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To rank predictors using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}. The default value for 'ClassNames' is the set of all distinct class names in Y or the response variable in Tbl. The default 'ClassNames' value has mathematical ordering if the response variable is ordinal. Otherwise, the default value has alphabetical ordering. Example: 'ClassNames',{'b','g'} Data Types: categorical | char | string | logical | single | double | cell NumBins — Number of bins for binning continuous predictors 10 (default) | positive integer scalar Number of bins for binning continuous predictors, specified as the comma-separated pair consisting of 'NumBins' and a positive integer scalar. Example: 'NumBins',50 Data Types: single | double 32-2235

32

Functions

Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and one of the following: • Character vector or string scalar. • 'empirical' determines class probabilities from class frequencies in the response variable in Y or Tbl. If you pass observation weights, fscchi2 uses the weights to compute the class probabilities. • 'uniform' sets all class probabilities to be equal. • Vector (one scalar value for each class). To specify the class order for the corresponding elements of 'Prior', also specify the ClassNames name-value pair argument. • Structure S with two fields. • S.ClassNames contains the class names as a variable of the same type as the response variable in Y or Tbl. • S.ClassProbs contains a vector of corresponding probabilities. If you set values for both 'Weights' and 'Prior', fscchi2 normalizes the weights in each class to add up to the value of the prior probability of the respective class. Example: 'Prior','uniform' Data Types: char | string | single | double | struct UseMissing — Indicator for whether to use or discard missing values in predictors false (default) | true Indicator for whether to use or discard missing values in predictors, specified as the commaseparated pair consisting of 'UseMissing' and either true to use or false to discard missing values in predictors for ranking. fscchi2 considers NaN, '' (empty character vector), "" (empty string), , and values to be missing values. If you specify 'UseMissing',true, then fscchi2 uses missing values for ranking. For a categorical variable, fscchi2 treats missing values as an extra category. For a continuous variable, fscchi2 places NaN values in a separate bin for binning. If you specify 'UseMissing',false, then fscchi2 does not use missing values for ranking. Because fscchi2 computes importance scores individually for each predictor, the function does not discard an entire row when values in the row are partially missing. For each variable, fscchi2 uses all values that are not missing. Example: 'UseMissing',true Data Types: logical Weights — Observation weights ones(size(X,1),1) (default) | vector of scalar values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values or the name of a variable in Tbl. The function weights the observations in each row of X 32-2236

fscchi2

or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows in X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weight vector is the column W of Tbl (Tbl.W), then specify 'Weights,'W'. fscchi2 normalizes the weights in each class to add up to the value of the prior probability of the respective class. Data Types: single | double | char | string

Output Arguments idx — Indices of predictors ordered by predictor importance numeric vector Indices of predictors in X or Tbl ordered by predictor importance, returned as a 1-by-r numeric vector, where r is the number of ranked predictors. If fscchi2 uses a subset of variables in Tbl as predictors, then the function indexes the predictors using only the subset. For example, suppose Tbl includes 10 columns and you specify the last five columns of Tbl as the predictor variables by using formula. If idx(3) is 5, then the third most important predictor is the 10th column in Tbl, which is the fifth predictor in the subset. scores — Predictor scores numeric vector Predictor scores, returned as a 1-by-r numeric vector, where r is the number of ranked predictors. A large score value indicates that the corresponding predictor is important. • If you use X to specify the predictors or use all the variables in Tbl as predictors, then the values in scores have the same order as the predictors in X or Tbl. • If you specify a subset of variables in Tbl as predictors, then the values in scores have the same order as the subset. For example, suppose Tbl includes 10 columns and you specify the last five columns of Tbl as the predictor variables by using formula. Then, score(3) contains the score value of the 8th column in Tbl, which is the third predictor in the subset.

Tips • If you specify the response variable and predictor variables by using the input argument formula, then the variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,Tbl.Properties.VariableNames)

32-2237

32

Functions

If the variable names in Tbl are not valid, then convert them by using the matlab.lang.makeValidName function. Tbl.Properties.VariableNames = matlab.lang.makeValidName(Tbl.Properties.VariableNames);

Algorithms Univariate Feature Ranking Using Chi-Square Tests • fscchi2 examines whether each predictor variable is independent of a response variable by using individual chi-square tests. A small p-value of the test statistic indicates that the corresponding predictor variable is dependent on the response variable, and, therefore is an important feature. • The output scores is –log(p). Therefore, a large score value indicates that the corresponding predictor is important. If a p-value is smaller than eps(0), then the output is Inf. • fscchi2 examines a continuous variable after binning, or discretizing, the variable. You can specify the number of bins using the 'NumBins' name-value pair argument.

See Also fscmrmr | fscnca | relieff | sequentialfs Topics “Introduction to Feature Selection” on page 15-49 Introduced in R2020a

32-2238

fscmrmr

fscmrmr Rank features for classification using minimum redundancy maximum relevance (MRMR) algorithm

Syntax idx = fscmrmr(Tbl,ResponseVarName) idx = fscmrmr(Tbl,formula) idx = fscmrmr(Tbl,Y) idx = fscmrmr(X,Y) idx = fscmrmr( ___ ,Name,Value) [idx,scores] = fscmrmr( ___ )

Description idx = fscmrmr(Tbl,ResponseVarName) ranks features (predictors) using the MRMR algorithm on page 32-2249. The table Tbl contains predictor variables and a response variable, and ResponseVarName is the name of the response variable in Tbl. The function returns idx, which contains the indices of predictors ordered by predictor importance. You can use idx to select important predictors for classification problems. idx = fscmrmr(Tbl,formula) specifies a response variable and predictor variables to consider among the variables in Tbl by using formula. idx = fscmrmr(Tbl,Y) ranks predictors in Tbl using the response variable Y. idx = fscmrmr(X,Y) ranks predictors in X using the response variable Y. idx = fscmrmr( ___ ,Name,Value) specifies additional options using one or more name-value pair arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can specify prior probabilities and observation weights. [idx,scores] = fscmrmr( ___ ) also returns the predictor scores scores. A large score value indicates that the corresponding predictor is important.

Examples Rank Predictors by Importance Load the sample data. load ionosphere

Rank the predictors based on importance. [idx,scores] = fscmrmr(X,Y);

Create a bar plot of the predictor importance scores. 32-2239

32

Functions

bar(scores(idx)) xlabel('Predictor rank') ylabel('Predictor importance score')

The drop in score between the first and second most important predictors is large, while the drops after the sixth predictor are relatively small. A drop in the importance score represents the confidence of feature selection. Therefore, the large drop implies that the software is confident of selecting the most important predictor. The small drops indicate that the difference in predictor importance are not significant. Select the top five most important predictors. Find the columns of these predictors in X. idx(1:5) ans = 1×5 5

4

1

7

24

The fifth column of X is the most important predictor of Y.

32-2240

fscmrmr

Select Features and Compare Accuracies of Two Classification Models Find important predictors by using fscmrmr. Then compare the accuracies of the full classification model (which uses all the predictors) and a reduced model that uses the five most important predictors by using testckfold. Load the census1994 data set. load census1994

The table adultdata in census1994 contains demographic data from the US Census Bureau to predict whether an individual makes over $50,000 per year. Display the first three rows of the table. head(adultdata,3) ans=3×15 table age workClass ___ ________________ 39 50 38

State-gov Self-emp-not-inc Private

fnlwgt __________

education _________

77516 83311 2.1565e+05

Bachelors Bachelors HS-grad

education_num _____________ 13 13 9

marital_status __________________ Never-married Married-civ-spouse Divorced

The output arguments of fscmrmr include only the variables ranked by the function. Before passing a table to the function, move the variables that you do not want to rank, including the response variable and weight, to the end of the table so that the order of the output arguments is consistent with the order of the table. In the table adultdata, the third column fnlwgt is the weight of the samples, and the last column salary is the response variable. Move fnlwgt to the left of salary by using the movevars function. adultdata = movevars(adultdata,'fnlwgt','before','salary'); head(adultdata,3) ans=3×15 table age workClass ___ ________________ 39 50 38

State-gov Self-emp-not-inc Private

education _________ Bachelors Bachelors HS-grad

education_num _____________ 13 13 9

marital_status __________________

occupation ______________

Never-married Married-civ-spouse Divorced

Adm-clerical Exec-manageria Handlers-clean

Rank the predictors in adultdata. Specify the column salary as the response variable. [idx,scores] = fscmrmr(adultdata,'salary','Weights','fnlwgt');

Create a bar plot of predictor importance scores. Use the predictor names for the x-axis tick labels. bar(scores(idx)) xlabel('Predictor rank') ylabel('Predictor importance score') xticklabels(strrep(adultdata.Properties.VariableNames(idx),'_','\_')) xtickangle(45)

32-2241

32

Functions

The five most important predictors are relationship, capital_loss, capital_gain, education, and hours_per_week. Compare the accuracy of a classification tree trained with all predictors to the accuracy of one trained with the five most important predictors. Create a classification tree template using the default options. C = templateTree;

Define the table tbl1 to contain all predictors and the table tbl2 to contain the five most important predictors. tbl1 = adultdata(:,adultdata.Properties.VariableNames(idx(1:13))); tbl2 = adultdata(:,adultdata.Properties.VariableNames(idx(1:5)));

Pass the classification tree template and the two tables to the testckfold function. The function compares the accuracies of the two models by repeated cross-validation. Specify 'Alternative','greater' to test the null hypothesis that the model with all predictors is, at most, as accurate as the model with the five predictors. The 'greater' option is available when 'Test' is '5x2t' (5-by-2 paired t test) or '10x10t' (10-by-10 repeated cross-validation t test).

[h,p] = testckfold(C,C,tbl1,tbl2,adultdata.salary,'Weights',adultdata.fnlwgt,'Alternative','great h = logical 0

32-2242

fscmrmr

p = 0.9966

h equals 0 and the p-value is almost 1, indicating failure to reject the null hypothesis. Using the model with the five predictors does not result in loss of accuracy compared to the model with all the predictors. Now train a classification tree using the selected predictors. mdl = fitctree(adultdata,'salary ~ relationship + capital_loss + capital_gain + education + 'Weights',adultdata.fnlwgt) mdl = ClassificationTree PredictorNames: ResponseName: CategoricalPredictors: ClassNames: ScoreTransform: NumObservations:

{1x5 cell} 'salary' [1 2] [50K] 'none' 32561

Properties, Methods

Input Arguments Tbl — Sample data table Sample data, specified as a table. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain additional columns for a response variable and observation weights. A response variable can be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element of the response variable must correspond to one row of the array. • If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName. If Tbl also contains the observation weights, then you can specify the weights by using Weights. • If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify the subset of variables by using formula. • If Tbl does not contain the response variable, then specify a response variable by using Y. The response variable and Tbl must have the same number of rows. If fscmrmr uses a subset of variables in Tbl as predictors, then the function indexes the predictors using only the subset. The values in the 'CategoricalPredictors' name-value pair argument and the output argument idx do not count the predictors that the function does not rank.

32-2243

hour

32

Functions

fscmrmr considers NaN, '' (empty character vector), "" (empty string), , and values in Tbl for a response variable to be missing values. fscmrmr does not use observations with missing values for a response variable. Data Types: table ResponseVarName — Response variable name character vector or string scalar containing name of variable in Tbl Response variable name, specified as a character vector or string scalar containing the name of a variable in Tbl. For example, if a response variable is the column Y of Tbl (Tbl.Y), then specify ResponseVarName as 'Y'. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form 'Y ~ X1 + X2 + X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables. To specify a subset of variables in Tbl as predictors, use a formula. If you specify a formula, then fscmrmr does not rank any variables in Tbl that do not appear in formula. The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. For details, see “Tips” on page 32-2248. Data Types: char | string Y — Response variable numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors Response variable, specified as a numeric, categorical, or logical vector, a character or string array, or a cell array of character vectors. Each row of Y represents the labels of the corresponding row of X. fscmrmr considers NaN, '' (empty character vector), "" (empty string), , and values in Y to be missing values. fscmrmr does not use observations with missing values for Y. Data Types: single | double | categorical | logical | char | string | cell X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one predictor variable. Data Types: single | double

32-2244

fscmrmr

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'CategoricalPredictors',[1 2],'Verbose',2 specifies the first two predictor variables as categorical variables and specifies the verbosity level as 2. CategoricalPredictors — List of categorical predictors vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' List of categorical predictors, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the values in this table. Value

Description

Vector of positive integers

Each entry in the vector is an index value corresponding to the column of the predictor data (X or Tbl) that contains a categorical variable.

Logical vector

A true entry means that the corresponding column of predictor data (X or Tbl) is a categorical variable.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the names in Tbl. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the names in Tbl.

'all'

All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fscmrmr assumes that a variable is categorical if it is a logical vector, unordered categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fscmrmr assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' name-value pair argument. If fscmrmr uses a subset of variables in Tbl as predictors, then the function indexes the predictors using only the subset. The 'CategoricalPredictors' values do not count the predictors that the function does not rank. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell ClassNames — Names of classes to use for ranking categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Names of the classes to use for ranking, specified as the comma-separated pair consisting of 'ClassNames' and a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. ClassNames must have the same data type as Y or the response variable in Tbl. If ClassNames is a character array, then each element must correspond to one row of the array. 32-2245

32

Functions

Use 'ClassNames' to: • Specify the order of the Prior dimensions that corresponds to the class order. • Select a subset of classes for ranking. For example, suppose that the set of all distinct class names in Y is {'a','b','c'}. To rank predictors using observations from classes 'a' and 'c' only, specify 'ClassNames',{'a','c'}. The default value for 'ClassNames' is the set of all distinct class names in Y or the response variable in Tbl. The default 'ClassNames' value has mathematical ordering if the response variable is ordinal. Otherwise, the default value has alphabetical ordering. Example: 'ClassNames',{'b','g'} Data Types: categorical | char | string | logical | single | double | cell Prior — Prior probabilities 'empirical' (default) | 'uniform' | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and one of the following: • Character vector or string scalar. • 'empirical' determines class probabilities from class frequencies in the response variable in Y or Tbl. If you pass observation weights, fscmrmr uses the weights to compute the class probabilities. • 'uniform' sets all class probabilities to be equal. • Vector (one scalar value for each class). To specify the class order for the corresponding elements of 'Prior', also specify the ClassNames name-value pair argument. • Structure S with two fields. • S.ClassNames contains the class names as a variable of the same type as the response variable in Y or Tbl. • S.ClassProbs contains a vector of corresponding probabilities. If you set values for both 'Weights' and 'Prior', fscmrmr normalizes the weights in each class to add up to the value of the prior probability of the respective class. Example: 'Prior','uniform' Data Types: char | string | single | double | struct UseMissing — Indicator for whether to use or discard missing values in predictors false (default) | true Indicator for whether to use or discard missing values in predictors, specified as the commaseparated pair consisting of 'UseMissing' and either true to use or false to discard missing values in predictors for ranking. fscmrmr considers NaN, '' (empty character vector), "" (empty string), , and values to be missing values. If you specify 'UseMissing',true, then fscmrmr uses missing values for ranking. For a categorical variable, fscmrmr treats missing values as an extra category. For a continuous variable, fscmrmr places NaN values in a separate bin for binning. 32-2246

fscmrmr

If you specify 'UseMissing',false, then fscmrmr does not use missing values for ranking. Because fscmrmr computes mutual information for each pair of variables, the function does not discard an entire row when values in the row are partially missing. fscmrmr uses all pair values that do not include missing values. Example: 'UseMissing',true Data Types: logical Verbose — Verbosity level 0 (default) | nonnegative integer Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and a nonnegative integer. The value of Verbose controls the amount of diagnostic information that the software displays in the Command Window. • 0 — fscmrmr does not display any diagnostic information. • 1 — fscmrmr displays the elapsed times for computing “Mutual Information” on page 32-2248 and ranking predictors. • ≥ 2 — fscmrmr displays the elapsed times and more messages related to computing mutual information. The amount of information increases as you increase the 'Verbose' value. Example: 'Verbose',1 Data Types: single | double Weights — Observation weights ones(size(X,1),1) (default) | vector of scalar values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values or the name of a variable in Tbl. The function weights the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows in X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weight vector is the column W of Tbl (Tbl.W), then specify 'Weights,'W'. fscmrmr normalizes the weights in each class to add up to the value of the prior probability of the respective class. Data Types: single | double | char | string

Output Arguments idx — Indices of predictors ordered by predictor importance numeric vector Indices of predictors in X or Tbl ordered by predictor importance, returned as a 1-by-r numeric vector, where r is the number of ranked predictors. If fscmrmr uses a subset of variables in Tbl as predictors, then the function indexes the predictors using only the subset. For example, suppose Tbl includes 10 columns and you specify the last five 32-2247

32

Functions

columns of Tbl as the predictor variables by using formula. If idx(3) is 5, then the third most important predictor is the 10th column in Tbl, which is the fifth predictor in the subset. scores — Predictor scores numeric vector Predictor scores, returned as a 1-by-r numeric vector, where r is the number of ranked predictors. A large score value indicates that the corresponding predictor is important. Also, a drop in the feature importance score represents the confidence of feature selection. For example, if the software is confident of selecting a feature x, then the score value of the next most important feature is much smaller than the score value of x. • If you use X to specify the predictors or use all the variables in Tbl as predictors, then the values in scores have the same order as the predictors in X or Tbl. • If you specify a subset of variables in Tbl as predictors, then the values in scores have the same order as the subset. For example, suppose Tbl includes 10 columns and you specify the last five columns of Tbl as the predictor variables by using formula. Then, score(3) contains the score value of the 8th column in Tbl, which is the third predictor in the subset.

More About Mutual Information The mutual information between two variables measures how much uncertainty of one variable can be reduced by knowing the other variable. The mutual information I of the discrete random variables X and Z is defined as I X, Z =

P X = x,Z = z

∑i, j P X = xi, Z = z j log P X = x iP Z = zj i

j

.

If X and Z are independent, then I equals 0. If X and Z are the same random variable, then I equals the entropy of X. The fscmrmr function uses this definition to compute the mutual information values for both categorical (discrete) and continuous variables. fscmrmr discretizes a continuous variable into 256 bins or the number of unique values in the variable if it is less than 256. The function finds optimal bivariate bins for each pair of variables using the adaptive algorithm [2].

Tips • If you specify the response variable and predictor variables by using the input argument formula, then the variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,Tbl.Properties.VariableNames)

32-2248

fscmrmr

If the variable names in Tbl are not valid, then convert them by using the matlab.lang.makeValidName function. Tbl.Properties.VariableNames = matlab.lang.makeValidName(Tbl.Properties.VariableNames);

Algorithms Minimum Redundancy Maximum Relevance (MRMR) Algorithm The MRMR algorithm[1] finds an optimal set of features that is mutually and maximally dissimilar and can represent the response variable effectively. The algorithm minimizes the redundancy of a feature set and maximizing the relevance of a feature set to the response variable. The algorithm quantifies the redundancy and relevance using the mutual information of variables—pairwise mutual information of features and mutual information of a feature and the response. You can use this algorithm for classification problems. The goal of the MRMR algorithm is to find an optimal set S of features that maximizes VS, the relevance of S with respect to a response variable y, and minimizes WS, the redundancy of S, where VS and WS are defined with mutual information on page 32-2248 I: VS = WS =

1 S

∑x ∈ S I x, y ,

1 S

2

∑x, z ∈ S I x, z

.

|S| is the number of features in S. Finding an optimal set S requires considering all 2|Ω| combinations, where Ω is the entire feature set. Instead, the MRMR algorithm ranks features through the forward addition scheme, which requires O(|Ω|·|S|) computations, by using the mutual information quotient (MIQ) value. MIQx =

Vx , Wx

where Vx and Wx are the relevance and redundancy of a feature, respectively: V x = I(x, y), Wx =

1 S

∑z ∈ S I x, z

.

The fscmrmr function ranks all features in Ω and returns idx (the indices of features ordered by feature importance) using the MRMR algorithm. Therefore, the computation cost becomes O(|Ω|2). The function quantifies the importance of a feature using a heuristic algorithm and returns score. A large score value indicates that the corresponding predictor is important. Also, a drop in the feature importance score represents the confidence of feature selection. For example, if the software is confident of selecting a feature x, then the score value of the next most important feature is much smaller than the score value of x. You can use the outputs to find an optimal set S for a given number of features. fscmrmr ranks features as follows: 1

Select the feature with the largest relevance, max V x. Add the selected feature to an empty set S. x∈Ω

32-2249

32

Functions

2

Find the features with nonzero relevance and zero redundancy in the complement of S, Sc. • If Sc does not include a feature with nonzero relevance and zero redundancy, go to step 4. • Otherwise, select the feature with the largest relevance,

max

V x. Add the selected

c x ∈ S , Wx = 0

feature to the set S. 3

Repeat Step 2 until the redundancy is not zero for all features in Sc.

4

Select the feature that has the largest MIQ value with nonzero relevance and nonzero redundancy in Sc, and add the selected feature to the set S. max MIQx = max c

x∈S

1 c x∈S S

I(x, y)

∑z ∈ S I x, z

.

5

Repeat Step 4 until the relevance is zero for all features in Sc.

6

Add the features with zero relevance to S in random order.

The software can skip any step if it cannot find a feature that satisfies the conditions described in the step.

Compatibility Considerations Specify 'UseMissing',true to use missing values in predictors for ranking Behavior changed in R2020a Starting in R2020a, you can specify whether to use or discard missing values in predictors for ranking by using the 'UseMissing' name-value pair argument. The default value of 'UseMissing' is false because most classification training functions in Statistics and Machine Learning Toolbox do not use missing values for training. In R2019b, fscmrmr used missing values in predictors by default. To update your code, specify 'UseMissing',true.

References [1] Ding, C., and H. Peng. "Minimum redundancy feature selection from microarray gene expression data." Journal of Bioinformatics and Computational Biology. Vol. 3, Number 2, 2005, pp. 185– 205. [2] Darbellay, G. A., and I. Vajda. "Estimation of the information by an adaptive partitioning of the observation space." IEEE Transactions on Information Theory. Vol. 45, Number 4, 1999, pp. 1315–1321.

See Also fscnca | fsulaplacian | relieff | sequentialfs Topics “Introduction to Feature Selection” on page 15-49 “Sequential Feature Selection” on page 15-61 Introduced in R2019b 32-2250

fscnca

fscnca Feature selection using neighborhood component analysis for classification

Syntax mdl = fscnca(X,Y) mdl = fscnca(X,Y,Name,Value)

Description mdl = fscnca(X,Y) performs feature selection for classification using the predictors in X and responses in Y. fscnca learns the feature weights by using a diagonal adaptation of neighborhood component analysis (NCA) with regularization. mdl = fscnca(X,Y,Name,Value) performs feature selection for classification with additional options specified by one or more name-value pair arguments.

Examples Detect Relevant Features in Data Using NCA for Classification Generate toy data where the response variable depends on the 3rd, 9th, and 15th predictors. rng(0,'twister'); % For reproducibility N = 100; X = rand(N,20); y = -ones(N,1); y(X(:,3).*X(:,9)./X(:,15) < 0.4) = 1;

Fit the neighborhood component analysis model for classification. mdl = fscnca(X,y,'Solver','sgd','Verbose',1); o Tuning initial learning rate: NumTuningIterations = 20, TuningSubsetSize = 100 |===============================================| | TUNING | TUNING SUBSET | LEARNING | | ITER | FUN VALUE | RATE | |===============================================| | 1 | -3.755936e-01 | 2.000000e-01 | | 2 | -3.950971e-01 | 4.000000e-01 | | 3 | -4.311848e-01 | 8.000000e-01 | | 4 | -4.903195e-01 | 1.600000e+00 | | 5 | -5.630190e-01 | 3.200000e+00 | | 6 | -6.166993e-01 | 6.400000e+00 | | 7 | -6.255669e-01 | 1.280000e+01 | | 8 | -6.255669e-01 | 1.280000e+01 | | 9 | -6.255669e-01 | 1.280000e+01 | | 10 | -6.255669e-01 | 1.280000e+01 |

32-2251

32

Functions

| | | | | | | | | |

11 12 13 14 15 16 17 18 19 20

| | | | | | | | | |

-6.255669e-01 -6.255669e-01 -6.255669e-01 -6.279210e-01 -6.279210e-01 -6.279210e-01 -6.279210e-01 -6.279210e-01 -6.279210e-01 -6.279210e-01

| | | | | | | | | |

1.280000e+01 1.280000e+01 1.280000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01

| | | | | | | | | |

o Solver = SGD, MiniBatchSize = 10, PassLimit = 5 |==========================================================================================| | PASS | ITER | AVG MINIBATCH | AVG MINIBATCH | NORM STEP | LEARNING | | | | FUN VALUE | NORM GRAD | | RATE | |==========================================================================================| | 0 | 9 | -5.658450e-01 | 4.492407e-02 | 9.290605e-01 | 2.560000e+01 | | 1 | 19 | -6.131382e-01 | 4.923625e-02 | 7.421541e-01 | 1.280000e+01 | | 2 | 29 | -6.225056e-01 | 3.738784e-02 | 3.277588e-01 | 8.533333e+00 | | 3 | 39 | -6.233366e-01 | 4.947901e-02 | 5.431133e-01 | 6.400000e+00 | | 4 | 49 | -6.238576e-01 | 3.445763e-02 | 2.946188e-01 | 5.120000e+00 | Two norm of the final step = 2.946e-01 Relative two norm of the final step = 6.588e-02, TolX = 1.000e-06 EXIT: Iteration or pass limit reached.

Plot the selected features. The weights of the irrelevant features should be close to zero. figure() plot(mdl.FeatureWeights,'ro') grid on xlabel('Feature index') ylabel('Feature weight')

32-2252

fscnca

fscnca correctly detects the relevant features.

Identify Relevant Features for Classification Load sample data load ovariancancer; whos Name grp obs

Size 216x1 216x4000

Bytes 26784 3456000

Class

Attributes

cell single

This example uses the high-resolution ovarian cancer data set that was generated using the WCX2 protein array. The data is from the FDA-NCI Clinical Proteomics Program Databank. After some preprocessing steps, the data set has two variables: obs and grp. The obs variable consists 216 observations with 4000 features. Each element in grp defines the group to which the corresponding row of obs belongs. Divide data into training and test sets Use cvpartition to divide data into a training set of size 160 and a test set of size 56. Both the training set and the test set have roughly the same group proportions as in grp. 32-2253

32

Functions

rng(1); % For reproducibility cvp = cvpartition(grp,'holdout',56) cvp = Hold-out cross validation partition NumObservations: 216 NumTestSets: 1 TrainSize: 160 TestSize: 56 Xtrain ytrain Xtest ytest

= = = =

obs(cvp.training,:); grp(cvp.training,:); obs(cvp.test,:); grp(cvp.test,:);

Determine if feature selection is necessary Compute generalization error without fitting. nca = fscnca(Xtrain,ytrain,'FitMethod','none'); L = loss(nca,Xtest,ytest) L = 0.0893

This option computes the generalization error of the neighborhood component analysis (NCA) feature selection model using the initial feature weights (in this case the default feature weights) provided in fscnca. Fit NCA without regularization parameter (Lambda = 0) nca = fscnca(Xtrain,ytrain,'FitMethod','exact','Lambda',0,... 'Solver','sgd','Standardize',true); L = loss(nca,Xtest,ytest) L = 0.0714

The improvement on the loss value suggests that feature selection is a good idea. Tuning the λ value usually improves the results. Tune the regularization parameter for NCA using five-fold cross-validation Tuning λ means finding the λ value that produces the minimum classification loss. To tune λ using cross-validation: 1. Partition the training data into five folds and extract the number of validation (test) sets. For each fold, cvpartition assigns four-fifths of the data as a training set, and one-fifth of the data as a test set. cvp = cvpartition(ytrain,'kfold',5); numvalidsets = cvp.NumTestSets;

Assign λ values and create an array to store the loss function values. n = length(ytrain); lambdavals = linspace(0,20,20)/n; lossvals = zeros(length(lambdavals),numvalidsets);

2. Train the NCA model for each λ value, using the training set in each fold. 32-2254

fscnca

3. Compute the classification loss for the corresponding test set in the fold using the NCA model. Record the loss value. 4. Repeat this process for all folds and all λ values. for i = 1:length(lambdavals) for k = 1:numvalidsets X = Xtrain(cvp.training(k),:); y = ytrain(cvp.training(k),:); Xvalid = Xtrain(cvp.test(k),:); yvalid = ytrain(cvp.test(k),:); nca = fscnca(X,y,'FitMethod','exact', ... 'Solver','sgd','Lambda',lambdavals(i), ... 'IterationLimit',30,'GradientTolerance',1e-4, ... 'Standardize',true); lossvals(i,k) = loss(nca,Xvalid,yvalid,'LossFunction','classiferror'); end end

Compute the average loss obtained from the folds for each λ value. meanloss = mean(lossvals,2);

Plot the average loss values versus the λ values. figure() plot(lambdavals,meanloss,'ro-') xlabel('Lambda') ylabel('Loss (MSE)') grid on

32-2255

32

Functions

Find the best lambda value that corresponds to the minimum average loss. [~,idx] = min(meanloss) % Find the index idx = 2 bestlambda = lambdavals(idx) % Find the best lambda value bestlambda = 0.0066 bestloss = meanloss(idx) bestloss = 0.0250

Fit the nca model on all data using best λ and plot the feature weights Use the solver lbfgs and standardize the predictor values. nca = fscnca(Xtrain,ytrain,'FitMethod','exact','Solver','sgd',... 'Lambda',bestlambda,'Standardize',true,'Verbose',1); o Tuning initial learning rate: NumTuningIterations = 20, TuningSubsetSize = 100 |===============================================| | TUNING | TUNING SUBSET | LEARNING | | ITER | FUN VALUE | RATE | |===============================================| | 1 | 2.403497e+01 | 2.000000e-01 | | 2 | 2.275050e+01 | 4.000000e-01 |

32-2256

fscnca

| | | | | | | | | | | | | | | | | |

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

| | | | | | | | | | | | | | | | | |

2.036845e+01 1.627647e+01 1.023512e+01 3.864283e+00 4.743816e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01 -7.260138e-01

| | | | | | | | | | | | | | | | | |

8.000000e-01 1.600000e+00 3.200000e+00 6.400000e+00 1.280000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01 2.560000e+01

| | | | | | | | | | | | | | | | | |

o Solver = SGD, MiniBatchSize = 10, PassLimit = 5 |==========================================================================================| | PASS | ITER | AVG MINIBATCH | AVG MINIBATCH | NORM STEP | LEARNING | | | | FUN VALUE | NORM GRAD | | RATE | |==========================================================================================| | 0 | 9 | 4.016078e+00 | 2.835465e-02 | 5.395984e+00 | 2.560000e+01 | | 1 | 19 | -6.726156e-01 | 6.111354e-02 | 5.021138e-01 | 1.280000e+01 | | 1 | 29 | -8.316555e-01 | 4.024185e-02 | 1.196030e+00 | 1.280000e+01 | | 2 | 39 | -8.838656e-01 | 2.333418e-02 | 1.225839e-01 | 8.533333e+00 | | 3 | 49 | -8.669035e-01 | 3.413150e-02 | 3.421881e-01 | 6.400000e+00 | | 3 | 59 | -8.906935e-01 | 1.946293e-02 | 2.232510e-01 | 6.400000e+00 | | 4 | 69 | -8.778630e-01 | 3.561283e-02 | 3.290643e-01 | 5.120000e+00 | | 4 | 79 | -8.857136e-01 | 2.516633e-02 | 3.902977e-01 | 5.120000e+00 | Two norm of the final step = 3.903e-01 Relative two norm of the final step = 6.171e-03, TolX = 1.000e-06 EXIT: Iteration or pass limit reached.

Plot the feature weights. figure() plot(nca.FeatureWeights,'ro') xlabel('Feature index') ylabel('Feature weight') grid on

32-2257

32

Functions

Select features using the feature weights and a relative threshold. tol = 0.02; selidx = find(nca.FeatureWeights > tol*max(1,max(nca.FeatureWeights))) selidx = 72×1 565 611 654 681 737 743 744 750 754 839



Compute the classification loss using the test set. L = loss(nca,Xtest,ytest) L = 0.0179

Classify observations using the selected features Extract the features with feature weights greater than 0 from the training data. 32-2258

fscnca

features = Xtrain(:,selidx);

Apply a support vector machine classifier using the selected features to the reduced training set. svmMdl = fitcsvm(features,ytrain);

Evaluate the accuracy of the trained classifier on the test data which has not been used for selecting features. L = loss(svmMdl,Xtest(:,selidx),ytest) L = single 0

Input Arguments X — Predictor variable values n-by-p matrix Predictor variable values, specified as an n-by-p matrix, where n is the number of observations and p is the number of predictor variables. Data Types: single | double Y — Class labels categorical vector | logical vector | numeric vector | string array | cell array of character vectors of length n | character matrix with n rows Class labels, specified as a categorical vector, logical vector, numeric vector, string array, cell array of character vectors of length n, or character matrix with n rows, where n is the number of observations. Element i or row i of Y is the class label corresponding to row i of X (observation i). Data Types: single | double | logical | char | string | cell | categorical Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Solver','sgd','Weights',W,'Lambda',0.0003 specifies the solver as the stochastic gradient descent, the observation weights as the values in the vector W, and sets the regularization parameter at 0.0003. Fitting Options

FitMethod — Method for fitting the model 'exact' (default) | 'none' | 'average' Method for fitting the model, specified as the comma-separated pair consisting of 'FitMethod' and one of the following: • 'exact' — Performs fitting using all of the data. • 'none' — No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights supplied in the call to fscnca. 32-2259

32

Functions

• 'average' — Divides the data into partitions (subsets), fits each partition using the exact method, and returns the average of the feature weights. You can specify the number of partitions using the NumPartitions name-value pair argument. Example: 'FitMethod','none' NumPartitions — Number of partitions max(2,min(10,n)) (default) | integer between 2 and n Number of partitions to split the data for using with 'FitMethod','average' option, specified as the comma-separated pair consisting of 'NumPartitions' and an integer value between 2 and n, where n is the number of observations. Example: 'NumPartitions',15 Data Types: double | single Lambda — Regularization parameter 1/n (default) | nonnegative scalar Regularization parameter to prevent overfitting, specified as the comma-separated pair consisting of 'Lambda' and a nonnegative scalar. As the number of observations n increases, the chance of overfitting decreases and the required amount of regularization also decreases. See “Identify Relevant Features for Classification” on page 32-2253 and “Tune Regularization Parameter to Detect Features Using NCA for Classification” to learn how to tune the regularization parameter. Example: 'Lambda',0.002 Data Types: double | single LengthScale — Width of the kernel 1 (default) | positive real scalar Width of the kernel, specified as the comma-separated pair consisting of 'LengthScale' and a positive real scalar. A length scale value of 1 is sensible when all predictors are on the same scale. If the predictors in X are of very different magnitudes, then consider standardizing the predictor values using 'Standardize',true and setting 'LengthScale',1. Example: 'LengthScale',1.5 Data Types: double | single InitialFeatureWeights — Initial feature weights ones(p,1) (default) | p-by-1 vector of real positive scalars Initial feature weights, specified as the comma-separated pair consisting of 'InitialFeatureWeights' and a p-by-1 vector of real positive scalars, where p is the number of predictors in the training data. The regularized objective function for optimizing feature weights is nonconvex. As a result, using different initial feature weights can give different results. Setting all initial feature weights to 1 generally works well, but in some cases, random initialization using rand(p,1) can give better quality solutions. 32-2260

fscnca

Data Types: double | single Weights — Observation weights n-by-1 vector of 1s (default) | n-by-1 vector of real positive scalars Observation weights, specified as the comma-separated pair consisting of 'ObservationWeights' and an n-by-1 vector of real positive scalars. Use observation weights to specify higher importance of some observations compared to others. The default weights assign equal importance to all observations. Data Types: double | single Prior — Prior probabilities for each class 'empirical' (default) | 'uniform' | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and one of the following: • 'empirical' — fscnca obtains the prior class probabilities from class frequencies. • 'uniform' — fscnca sets all class probabilities equal. • Structure with two fields: • ClassProbs — Vector of class probabilities. If these are numeric values with a total greater than 1, fsnca normalizes them to add up to 1. • ClassNames — Class names corresponding to the class probabilities in ClassProbs. Example: 'Prior','uniform' Standardize — Indicator for standardizing predictor data false (default) | true Indicator for standardizing the predictor data, specified as the comma-separated pair consisting of 'Standardize' and either false or true. For more information, see “Impact of Standardization” on page 15-102. Example: 'Standardize',true Data Types: logical Verbose — Verbosity level indicator 0 (default) | 1 | >1 Verbosity level indicator for the convergence summary display, specified as the comma-separated pair consisting of 'Verbose' and one of the following: • 0 — No convergence summary • 1 — Convergence summary, including norm of gradient and objective function values • > 1 — More convergence information, depending on the fitting algorithm When using 'minibatch-lbfgs' solver and verbosity level > 1, the convergence information includes iteration the log from intermediate mini-batch LBFGS fits. Example: 'Verbose',1 Data Types: double | single 32-2261

32

Functions

Solver — Solver type 'lbfgs' | 'sgd' | 'minibatch-lbfgs' Solver type for estimating feature weights, specified as the comma-separated pair consisting of 'Solver' and one of the following: • 'lbfgs' — Limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm • 'sgd' — Stochastic gradient descent (SGD) algorithm • 'minibatch-lbfgs' — Stochastic gradient descent with LBFGS algorithm applied to minibatches Default is 'lbfgs' for n ≤ 1000, and 'sgd' for n > 1000. Example: 'solver','minibatch-lbfgs' LossFunction — Loss function 'classiferror' (default) | function handle Loss function, specified as the comma-separated pair consisting of 'LossFunction' and one of the following. • 'classiferror' — Misclassification error l yi, y j =

1 if yi ≠ y j, 0 otherwise.

• @lossfun — Custom loss function handle. A loss function has this form. function L = lossfun(Yu,Yv) % calculation of loss ...

Yu is a u-by-1 vector and Yv is a v-by-1 vector. L is a u-by-v matrix of loss values such that L(i,j) is the loss value for Yu(i) and Yv(j). The objective function for minimization includes the loss function l(yi,yj) as follows: f w =

n

n

p

1 ∑ pi jl yi, y j + λ ∑ wr2, ni∑ r=1 = 1 j = 1, j ≠ i

where w is the feature weight vector, n is the number of observations, and p is the number of predictor variables. pij is the probability that xj is the reference point for xi. For details, see “NCA Feature Selection for Classification” on page 15-99. Example: 'LossFunction',@lossfun CacheSize — Memory size 1000MB (default) | integer Memory size, in MB, to use for objective function and gradient computation, specified as the commaseparated pair consisting of 'CacheSize' and an integer. Example: 'CacheSize',1500MB Data Types: double | single 32-2262

fscnca

LBFGS Options

HessianHistorySize — Size of history buffer for Hessian approximation 15 (default) | positive integer Size of history buffer for Hessian approximation for the 'lbfgs' solver, specified as the commaseparated pair consisting of 'HessianHistorySize' and a positive integer. At each iteration the function uses the most recent HessianHistorySize iterations to build an approximation to the inverse Hessian. Example: 'HessianHistorySize',20 Data Types: double | single InitialStepSize — Initial step size 'auto' (default) | positive real scalar Initial step size for the 'lbfgs' solver, specified as the comma-separated pair consisting of 'InitialStepSize' and a positive real scalar. By default, the function determines the initial step size automatically. Data Types: double | single LineSearchMethod — Line search method 'weakwolfe' (default) | 'strongwolfe' | 'backtracking' Line search method, specified as the comma-separated pair consisting of 'LineSearchMethod' and one of the following: • 'weakwolfe' — Weak Wolfe line search • 'strongwolfe' — Strong Wolfe line search • 'backtracking' — Backtracking line search Example: 'LineSearchMethod','backtracking' MaxLineSearchIterations — Maximum number of line search iterations 20 (default) | positive integer Maximum number of line search iterations, specified as the comma-separated pair consisting of 'MaxLineSearchIterations' and a positive integer. Example: 'MaxLineSearchIterations',25 Data Types: double | single GradientTolerance — Relative convergence tolerance 1e-6 (default) | positive real scalar Relative convergence tolerance on the gradient norm for solver lbfgs, specified as the commaseparated pair consisting of 'GradientTolerance' and a positive real scalar. Example: 'GradientTolerance',0.000002 Data Types: double | single

32-2263

32

Functions

SGD Options

InitialLearningRate — Initial learning rate for 'sgd' solver 'auto' (default) | positive real scalar Initial learning rate for the 'sgd' solver, specified as the comma-separated pair consisting of 'InitialLearningRate' and a positive real scalar. When using solver type 'sgd', the learning rate decays over iterations starting with the value specified for 'InitialLearningRate'. The default 'auto' means that the initial learning rate is determined using experiments on small subsets of data. Use the NumTuningIterations name-value pair argument to specify the number of iterations for automatically tuning the initial learning rate. Use the TuningSubsetSize name-value pair argument to specify the number of observations to use for automatically tuning the initial learning rate. For solver type 'minibatch-lbfgs', you can set 'InitialLearningRate' to a very high value. In this case, the function applies LBFGS to each mini-batch separately with initial feature weights from the previous mini-batch. To make sure the chosen initial learning rate decreases the objective value with each iteration, plot the Iteration versus the Objective values saved in the mdl.FitInfo property. You can use the refit method with 'InitialFeatureWeights' equal to mdl.FeatureWeights to start from the current solution and run additional iterations Example: 'InitialLearningRate',0.9 Data Types: double | single MiniBatchSize — Number of observations to use in each batch for the 'sgd' solver min(10,n) (default) | positive integer value from 1 to n Number of observations to use in each batch for the 'sgd' solver, specified as the comma-separated pair consisting of 'MiniBatchSize' and a positive integer from 1 to n. Example: 'MiniBatchSize',25 Data Types: double | single PassLimit — Maximum number of passes for solver 'sgd' 5 (default) | positive integer Maximum number of passes through all n observations for solver 'sgd', specified as the commaseparated pair consisting of 'PassLimit' and a positive integer. Each pass through all of the data is called an epoch. Example: 'PassLimit',10 Data Types: double | single NumPrint — Frequency of batches for displaying convergence summary 10 (default) | positive integer value Frequency of batches for displaying convergence summary for the 'sgd' solver , specified as the comma-separated pair consisting of 'NumPrint' and a positive integer. This argument applies when 32-2264

fscnca

the 'Verbose' value is greater than 0. NumPrint mini-batches are processed for each line of the convergence summary that is displayed on the command line. Example: 'NumPrint',5 Data Types: double | single NumTuningIterations — Number of tuning iterations 20 (default) | positive integer Number of tuning iterations for the 'sgd' solver, specified as the comma-separated pair consisting of 'NumTuningIterations' and a positive integer. This option is valid only for 'InitialLearningRate','auto'. Example: 'NumTuningIterations',15 Data Types: double | single TuningSubsetSize — Number of observations to use for tuning initial learning rate min(100,n) (default) | positive integer value from 1 to n Number of observations to use for tuning the initial learning rate, specified as the comma-separated pair consisting of 'TuningSubsetSize' and a positive integer value from 1 to n. This option is valid only for 'InitialLearningRate','auto'. Example: 'TuningSubsetSize',25 Data Types: double | single SGD or LBFGS Options

IterationLimit — Maximum number of iterations positive integer Maximum number of iterations, specified as the comma-separated pair consisting of 'IterationLimit' and a positive integer. The default is 10000 for SGD and 1000 for LBFGS and mini-batch LBFGS. Each pass through a batch is an iteration. Each pass through all of the data is an epoch. If the data is divided into k mini-batches, then every epoch is equivalent to k iterations. Example: 'IterationLimit',250 Data Types: double | single StepTolerance — Convergence tolerance on the step size 1e-6 (default) | positive real scalar Convergence tolerance on the step size, specified as the comma-separated pair consisting of 'StepTolerance' and a positive real scalar. The 'lbfgs' solver uses an absolute step tolerance, and the 'sgd' solver uses a relative step tolerance. Example: 'StepTolerance',0.000005 Data Types: double | single Mini-Batch LBFGS Options

MiniBatchLBFGSIterations — Maximum number of iterations per mini-batch LBFGS step 10 (default) | positive integer 32-2265

32

Functions

Maximum number of iterations per mini-batch LBFGS step, specified as the comma-separated pair consisting of 'MiniBatchLBFGSIterations' and a positive integer. Example: 'MiniBatchLBFGSIterations',15 Mini-batch LBFGS algorithm is a combination of SGD and LBFGS methods. Therefore, all of the name-value pair arguments that apply to SGD and LBFGS solvers also apply to the minibatch LBFGS algorithm. Data Types: double | single

Output Arguments mdl — Neighborhood component analysis model for classification FeatureSelectionNCAClassification object Neighborhood component analysis model for classification, returned as a FeatureSelectionNCAClassification object.

See Also FeatureSelectionNCAClassification | loss | predict | refit Topics “Tune Regularization Parameter to Detect Features Using NCA for Classification” “Neighborhood Component Analysis (NCA) Feature Selection” on page 15-99 “Introduction to Feature Selection” on page 15-49 Introduced in R2016b

32-2266

fsrnca

fsrnca Feature selection using neighborhood component analysis for regression

Syntax mdl = fsrnca(X,Y) mdl = fsrnca(X,Y,Name,Value)

Description mdl = fsrnca(X,Y) performs feature selection for regression using the predictors in X and responses in Y. fsrnca learns the feature weights by a diagonal adaptation of neighborhood component analysis (NCA) with regularization. mdl = fsrnca(X,Y,Name,Value) performs feature selection for regression with additional options specified by one or more name-value pair arguments.

Examples Detect Relevant Features in Data Using NCA for Regression Generate toy data where the response variable depends on the 3rd, 9th, and 15th predictors. rng(0,'twister'); % For reproducibility N = 100; X = rand(N,20); y = 1 + X(:,3)*5 + sin(X(:,9)./X(:,15) + 0.25*randn(N,1));

Fit the neighborhood component analysis model for regression. mdl = fsrnca(X,y,'Verbose',1,'Lambda',0.5/N); o Solver = LBFGS, HessianHistorySize = 15, LineSearchMethod = weakwolfe

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 0 | 1.636932e+00 | 3.688e-01 | 0.000e+00 | | 1.627e+00 | 0.000e+00 | Y | 1 | 8.304833e-01 | 1.083e-01 | 2.449e+00 | OK | 9.194e+00 | 4.000e+00 | Y | 2 | 7.548105e-01 | 1.341e-02 | 1.164e+00 | OK | 1.095e+01 | 1.000e+00 | Y | 3 | 7.346997e-01 | 9.752e-03 | 6.383e-01 | OK | 2.979e+01 | 1.000e+00 | Y | 4 | 7.053407e-01 | 1.605e-02 | 1.712e+00 | OK | 5.809e+01 | 1.000e+00 | Y | 5 | 6.970502e-01 | 9.106e-03 | 8.818e-01 | OK | 6.223e+01 | 1.000e+00 | Y | 6 | 6.952347e-01 | 5.522e-03 | 6.382e-01 | OK | 3.280e+01 | 1.000e+00 | Y | 7 | 6.946302e-01 | 9.102e-04 | 1.952e-01 | OK | 3.380e+01 | 1.000e+00 | Y | 8 | 6.945037e-01 | 6.557e-04 | 9.942e-02 | OK | 8.490e+01 | 1.000e+00 | Y | 9 | 6.943908e-01 | 1.997e-04 | 1.756e-01 | OK | 1.124e+02 | 1.000e+00 | Y | 10 | 6.943785e-01 | 3.478e-04 | 7.755e-02 | OK | 7.621e+01 | 1.000e+00 | Y | 11 | 6.943728e-01 | 1.428e-04 | 3.416e-02 | OK | 3.649e+01 | 1.000e+00 | Y

32-2267

32

Functions

| | | | | | | |

12 13 14 15 16 17 18 19

| | | | | | | |

6.943711e-01 6.943688e-01 6.943655e-01 6.943603e-01 6.943582e-01 6.943552e-01 6.943546e-01 6.943546e-01

| | | | | | | |

1.128e-04 1.066e-04 9.324e-05 1.206e-04 1.701e-04 5.160e-05 2.477e-05 1.077e-05

| | | | | | | |

1.231e-02 2.326e-02 4.399e-02 8.823e-02 6.669e-02 6.473e-02 1.215e-02 6.086e-03

| | | | | | | |

OK OK OK OK OK OK OK OK

| | | | | | | |

6.092e+01 9.319e+01 1.810e+02 4.609e+02 8.425e+01 8.832e+01 7.925e+01 1.378e+02

| | | | | | | |

1.000e+00 1.000e+00 1.000e+00 1.000e+00 5.000e-01 1.000e+00 1.000e+00 1.000e+00

| | | | | | | |

Y Y Y Y Y Y Y Y

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 20 | 6.943545e-01 | 2.260e-05 | 4.071e-03 | OK | 5.856e+01 | 1.000e+00 | Y | 21 | 6.943545e-01 | 4.250e-06 | 1.109e-03 | OK | 2.964e+01 | 1.000e+00 | Y | 22 | 6.943545e-01 | 1.916e-06 | 8.356e-04 | OK | 8.649e+01 | 1.000e+00 | Y | 23 | 6.943545e-01 | 1.083e-06 | 5.270e-04 | OK | 1.168e+02 | 1.000e+00 | Y | 24 | 6.943545e-01 | 1.791e-06 | 2.673e-04 | OK | 4.016e+01 | 1.000e+00 | Y | 25 | 6.943545e-01 | 2.596e-07 | 1.111e-04 | OK | 3.154e+01 | 1.000e+00 | Y Infinity norm of the final gradient = 2.596e-07 Two norm of the final step = 1.111e-04, TolX = 1.000e-06 Relative infinity norm of the final gradient = 2.596e-07, TolFun = 1.000e-06 EXIT: Local minimum found.

Plot the selected features. The weights of the irrelevant features should be close to zero. figure() plot(mdl.FeatureWeights,'ro') grid on xlabel('Feature index') ylabel('Feature weight')

32-2268

fsrnca

fsrnca correctly detects the relevant predictors for this response.

Tune Regularization Parameter in NCA for Regression Load the sample data. load robotarm.mat

The robotarm (pumadyn32nm) dataset is created using a robot arm simulator with 7168 training observations and 1024 test observations with 32 features [1][2]. This is a preprocessed version of the original data set. The data are preprocessed by subtracting off a linear regression fit, followed by normalization of all features to unit variance. Perform neighborhood component analysis (NCA) feature selection for regression with the default λ (regularization parameter) value. nca = fsrnca(Xtrain,ytrain,'FitMethod','exact', ... 'Solver','lbfgs');

Plot the selected values. figure plot(nca.FeatureWeights,'ro') xlabel('Feature index')

32-2269

32

Functions

ylabel('Feature weight') grid on

More than half of the feature weights are nonzero. Compute the loss using the test set as a measure of the performance by using the selected features. L = loss(nca,Xtest,ytest) L = 0.0837

Try improving the performance. Tune the regularization parameter λ for feature selection using fivefold cross-validation. Tuning λ means finding the λ value that produces the minimum regression loss. To tune λ using cross-validation: 1. Partition the data into five folds. For each fold, cvpartition assigns 4/5th of the data as a training set, and 1/5th of the data as a test set. rng(1) % For reproducibility n = length(ytrain); cvp = cvpartition(length(ytrain),'kfold',5); numvalidsets = cvp.NumTestSets;

Assign the λ values for the search. Multiplying response values by a constant increases the loss function term by a factor of the constant. Therefore, including the std(ytrain) factor in the λ values balances the default loss function ('mad', mean absolute deviation) term and the regularization term in the objective function. In this example, the std(ytrain) factor is one because the loaded sample data is a preprocessed version of the original data set. 32-2270

fsrnca

lambdavals = linspace(0,50,20)*std(ytrain)/n;

Create an array to store the loss values. lossvals = zeros(length(lambdavals),numvalidsets);

2. Train the NCA model for each λ value, using the training set in each fold. 3. Compute the regression loss for the corresponding test set in the fold using the NCA model. Record the loss value. 4. Repeat this for each λ value and each fold. for i = 1:length(lambdavals) for k = 1:numvalidsets X = Xtrain(cvp.training(k),:); y = ytrain(cvp.training(k),:); Xvalid = Xtrain(cvp.test(k),:); yvalid = ytrain(cvp.test(k),:); nca = fsrnca(X,y,'FitMethod','exact', ... 'Solver','minibatch-lbfgs','Lambda',lambdavals(i), ... 'GradientTolerance',1e-4,'IterationLimit',30); lossvals(i,k) = loss(nca,Xvalid,yvalid,'LossFunction','mse'); end end

Compute the average loss obtained from the folds for each λ value. meanloss = mean(lossvals,2);

Plot the mean loss versus the λ values. figure plot(lambdavals,meanloss,'ro-') xlabel('Lambda') ylabel('Loss (MSE)') grid on

32-2271

32

Functions

Find the λ value that gives the minimum loss value. [~,idx] = min(meanloss) idx = 17 bestlambda = lambdavals(idx) bestlambda = 0.0059 bestloss = meanloss(idx) bestloss = 0.0590

Fit the NCA feature selection model for regression using the best λ value. nca = fsrnca(Xtrain,ytrain,'FitMethod','exact', ... 'Solver','lbfgs','Lambda',bestlambda);

Plot the selected features. figure plot(nca.FeatureWeights,'ro') xlabel('Feature Index') ylabel('Feature Weight') grid on

32-2272

fsrnca

Most of the feature weights are zero. fsrnca identifies the four most relevant features. Compute the loss for the test set. L = loss(nca,Xtest,ytest) L = 0.0571

Tuning the regularization parameter, λ, eliminated more of the irrelevant features and improved the performance.

Compare NCA and ARD Feature Selection This example uses the Abalone data [3][4] from the UCI Machine Learning Repository [5]. Download the data and save it in your current folder with the name 'abalone.data'. Store the data into a table. Display the first seven rows. tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false); tbl.Properties.VariableNames = {'Sex','Length','Diameter','Height', ... 'WWeight','SWeight','VWeight','ShWeight','NoShellRings'}; tbl(1:7,:) ans=7×9 table Sex Length

Diameter

Height

WWeight

SWeight

VWeight

ShWeight

32-2273

NoShell

32

Functions

_____

______

{'M'} {'M'} {'F'} {'M'} {'I'} {'I'} {'F'}

0.455 0.35 0.53 0.44 0.33 0.425 0.53

________ 0.365 0.265 0.42 0.365 0.255 0.3 0.415

______

_______

_______

_______

0.095 0.09 0.135 0.125 0.08 0.095 0.15

0.514 0.2255 0.677 0.516 0.205 0.3515 0.7775

0.2245 0.0995 0.2565 0.2155 0.0895 0.141 0.237

0.101 0.0485 0.1415 0.114 0.0395 0.0775 0.1415

________ 0.15 0.07 0.21 0.155 0.055 0.12 0.33

The dataset has 4177 observations. The goal is to predict the age of abalone from eight physical measurements. The last variable, the number of shell rings, shows the age of the abalone. The first predictor is a categorical variable. The last variable in the table is the response variable. Prepare the predictor and response variables for fsrnca. The last column of tbl contains the number of shell rings, which is the response variable. The first predictor variable, sex, is categorical. You must create dummy variables. y = table2array(tbl(:,end)); X(:,1:3) = dummyvar(categorical(tbl.Sex)); X = [X,table2array(tbl(:,2:end-1))];

Use four-fold cross-validation to tune the regularization parameter in the NCA model. First partition the data into four folds. rng('default') % For reproducibility n = length(y); cvp = cvpartition(n,'kfold',4); numtestsets = cvp.NumTestSets;

cvpartition divides the data into four partitions (folds). In each fold, about three-fourths of the data is assigned as a training set and one-fourth is assigned as a test set. Generate a variety of λ (regularization parameter) values for fitting the model to determine the best λ value. Create a vector to collect the loss values from each fit. lambdavals = linspace(0,25,20)*std(y)/n; lossvals = zeros(length(lambdavals),numtestsets);

The rows of lossvals corresponds to the λ values and the columns correspond to the folds. Fit the NCA model for regression using fsrnca to the data from each fold using each λ value. Compute the loss for each model using the test data from each fold. for i = 1:length(lambdavals) for k = 1:numtestsets Xtrain = X(cvp.training(k),:); ytrain = y(cvp.training(k),:); Xtest = X(cvp.test(k),:); ytest = y(cvp.test(k),:); nca = fsrnca(Xtrain,ytrain,'FitMethod','exact', ... 'Solver','lbfgs','Lambda',lambdavals(i),'Standardize',true); lossvals(i,k) = loss(nca,Xtest,ytest,'LossFunction','mse'); end end

32-2274

_______

15 7 9 10 7 8 20

fsrnca

Compute the average loss for the folds, that is, compute the mean in the second dimension of lossvals. meanloss = mean(lossvals,2);

Plot the λ values versus the mean loss from the four folds. figure plot(lambdavals,meanloss,'ro-') xlabel('Lambda') ylabel('Loss (MSE)') grid on

Find the λ value that minimizes the mean loss. [~,idx] = min(meanloss); bestlambda = lambdavals(idx) bestlambda = 0.0071

Compute the best loss value. bestloss = meanloss(idx) bestloss = 4.7799

Fit the NCA model on all of the data using the best λ value. 32-2275

32

Functions

nca = fsrnca(X,y,'FitMethod','exact','Solver','lbfgs', ... 'Verbose',1,'Lambda',bestlambda,'Standardize',true); o Solver = LBFGS, HessianHistorySize = 15, LineSearchMethod = weakwolfe

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 0 | 2.469168e+00 | 1.266e-01 | 0.000e+00 | | 4.741e+00 | 0.000e+00 | Y | 1 | 2.375166e+00 | 8.265e-02 | 7.268e-01 | OK | 1.054e+01 | 1.000e+00 | Y | 2 | 2.293528e+00 | 2.067e-02 | 2.034e+00 | OK | 1.569e+01 | 1.000e+00 | Y | 3 | 2.286703e+00 | 1.031e-02 | 3.158e-01 | OK | 2.213e+01 | 1.000e+00 | Y | 4 | 2.279928e+00 | 2.023e-02 | 9.374e-01 | OK | 1.953e+01 | 1.000e+00 | Y | 5 | 2.276258e+00 | 6.884e-03 | 2.497e-01 | OK | 1.439e+01 | 1.000e+00 | Y | 6 | 2.274358e+00 | 1.792e-03 | 4.010e-01 | OK | 3.109e+01 | 1.000e+00 | Y | 7 | 2.274105e+00 | 2.412e-03 | 2.399e-01 | OK | 3.557e+01 | 1.000e+00 | Y | 8 | 2.274073e+00 | 1.459e-03 | 7.684e-02 | OK | 1.356e+01 | 1.000e+00 | Y | 9 | 2.274050e+00 | 3.733e-04 | 3.797e-02 | OK | 1.725e+01 | 1.000e+00 | Y | 10 | 2.274043e+00 | 2.750e-04 | 1.379e-02 | OK | 2.445e+01 | 1.000e+00 | Y | 11 | 2.274027e+00 | 2.682e-04 | 5.701e-02 | OK | 7.386e+01 | 1.000e+00 | Y | 12 | 2.274020e+00 | 1.712e-04 | 4.107e-02 | OK | 9.461e+01 | 1.000e+00 | Y | 13 | 2.274014e+00 | 2.633e-04 | 6.720e-02 | OK | 7.469e+01 | 1.000e+00 | Y | 14 | 2.274012e+00 | 9.818e-05 | 2.263e-02 | OK | 3.275e+01 | 1.000e+00 | Y | 15 | 2.274012e+00 | 4.220e-05 | 6.188e-03 | OK | 2.799e+01 | 1.000e+00 | Y | 16 | 2.274012e+00 | 2.859e-05 | 4.979e-03 | OK | 6.628e+01 | 1.000e+00 | Y | 17 | 2.274011e+00 | 1.582e-05 | 6.767e-03 | OK | 1.439e+02 | 1.000e+00 | Y | 18 | 2.274011e+00 | 7.623e-06 | 4.311e-03 | OK | 1.211e+02 | 1.000e+00 | Y | 19 | 2.274011e+00 | 3.038e-06 | 2.528e-04 | OK | 1.798e+01 | 5.000e-01 | Y

|================================================================================================ | ITER | FUN VALUE | NORM GRAD | NORM STEP | CURV | GAMMA | ALPHA | ACC |================================================================================================ | 20 | 2.274011e+00 | 6.710e-07 | 2.325e-04 | OK | 2.721e+01 | 1.000e+00 | Y Infinity norm of the final gradient = 6.710e-07 Two norm of the final step = 2.325e-04, TolX = 1.000e-06 Relative infinity norm of the final gradient = 6.710e-07, TolFun = 1.000e-06 EXIT: Local minimum found.

Plot the selected features. figure plot(nca.FeatureWeights,'ro') xlabel('Feature Index') ylabel('Feature Weight') grid on

32-2276

fsrnca

The irrelevant features have zero weights. According to this figure, the features 1, 3, and 9 are not selected. Fit a Gaussian process regression (GPR) model using the subset of regressors method for parameter estimation and the fully independent conditional method for prediction. Use the ARD squared exponential kernel function, which assigns an individual weight to each predictor. Standardize the predictors. gprMdl = fitrgp(tbl,'NoShellRings','KernelFunction','ardsquaredexponential', ... 'FitMethod','sr','PredictMethod','fic','Standardize',true) gprMdl = RegressionGP PredictorNames: ResponseName: CategoricalPredictors: ResponseTransform: NumObservations: KernelFunction: KernelInformation: BasisFunction: Beta: Sigma: PredictorLocation: PredictorScale: Alpha: ActiveSetVectors:

{'Sex' 'Length' 'Diameter' 'NoShellRings' 1 'none' 4177 'ARDSquaredExponential' [1×1 struct] 'Constant' 11.4959 2.0282 [10×1 double] [10×1 double] [1000×1 double] [1000×10 double]

'Height'

'WWeight'

'SWeight'

32-2277

'VWeight

32

Functions

PredictMethod: ActiveSetSize: FitMethod: ActiveSetMethod: IsActiveSetVector: LogLikelihood: ActiveSetHistory: BCDInformation:

'FIC' 1000 'SR' 'Random' [4177×1 logical] -9.0019e+03 [1×1 struct] []

Properties, Methods

Compute the regression loss on the training data (resubstitution loss) for the trained model. L = resubLoss(gprMdl) L = 4.0306

The smallest cross-validated loss using fsrnca is comparable to the loss obtained using a GPR model with an ARD kernel.

Input Arguments X — Predictor variable values n-by-p matrix Predictor variable values, specified as an n-by-p matrix, where n is the number of observations and p is the number of predictor variables. Data Types: single | double Y — Response values numeric real vector of length n Response values, specified as a numeric real vector of length n, where n is the number of observations. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Solver','sgd','Weights',W,'Lambda',0.0003 specifies the solver as the stochastic gradient descent, the observation weights as the values in the vector W, and sets the regularization parameter at 0.0003. Fitting Options

FitMethod — Method for fitting the model 'exact' (default) | 'none' | 'average' Method for fitting the model, specified as the comma-separated pair consisting of 'FitMethod' and one of the following: 32-2278

fsrnca

• 'exact' — Performs fitting using all of the data. • 'none' — No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights supplied in the call to fsrnca. • 'average' — Divides the data into partitions (subsets), fits each partition using the exact method, and returns the average of the feature weights. You can specify the number of partitions using the NumPartitions name-value pair argument. Example: 'FitMethod','none' NumPartitions — Number of partitions max(2,min(10,n)) (default) | integer between 2 and n Number of partitions to split the data for using with 'FitMethod','average' option, specified as the comma-separated pair consisting of 'NumPartitions' and an integer value between 2 and n, where n is the number of observations. Example: 'NumPartitions',15 Data Types: double | single Lambda — Regularization parameter 1/n (default) | nonnegative scalar Regularization parameter to prevent overfitting, specified as the comma-separated pair consisting of 'Lambda' and a nonnegative scalar. As the number of observations n increases, the chance of overfitting decreases and the required amount of regularization also decreases. See “Tune Regularization Parameter in NCA for Regression” on page 32-2269 to learn how to tune the regularization parameter. Example: 'Lambda',0.002 Data Types: double | single LengthScale — Width of the kernel 1 (default) | positive real scalar Width of the kernel, specified as the comma-separated pair consisting of 'LengthScale' and a positive real scalar. A length scale value of 1 is sensible when all predictors are on the same scale. If the predictors in X are of very different magnitudes, then consider standardizing the predictor values using 'Standardize',true and setting 'LengthScale',1. Example: 'LengthScale',1.5 Data Types: double | single InitialFeatureWeights — Initial feature weights ones(p,1) (default) | p-by-1 vector of real positive scalars Initial feature weights, specified as the comma-separated pair consisting of 'InitialFeatureWeights' and a p-by-1 vector of real positive scalars, where p is the number of predictors in the training data. The regularized objective function for optimizing feature weights is nonconvex. As a result, using different initial feature weights can give different results. Setting all initial feature weights to 1 32-2279

32

Functions

generally works well, but in some cases, random initialization using rand(p,1) can give better quality solutions. Data Types: double | single Weights — Observation weights n-by-1 vector of 1s (default) | n-by-1 vector of real positive scalars Observation weights, specified as the comma-separated pair consisting of 'ObservationWeights' and an n-by-1 vector of real positive scalars. Use observation weights to specify higher importance of some observations compared to others. The default weights assign equal importance to all observations. Data Types: double | single Standardize — Indicator for standardizing predictor data false (default) | true Indicator for standardizing the predictor data, specified as the comma-separated pair consisting of 'Standardize' and either false or true. For more information, see “Impact of Standardization” on page 15-102. Example: 'Standardize',true Data Types: logical Verbose — Verbosity level indicator 0 (default) | 1 | >1 Verbosity level indicator for the convergence summary display, specified as the comma-separated pair consisting of 'Verbose' and one of the following: • 0 — No convergence summary • 1 — Convergence summary, including norm of gradient and objective function values • > 1 — More convergence information, depending on the fitting algorithm When using 'minibatch-lbfgs' solver and verbosity level > 1, the convergence information includes iteration the log from intermediate mini-batch LBFGS fits. Example: 'Verbose',1 Data Types: double | single Solver — Solver type 'lbfgs' | 'sgd' | 'minibatch-lbfgs' Solver type for estimating feature weights, specified as the comma-separated pair consisting of 'Solver' and one of the following: • 'lbfgs' — Limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm • 'sgd' — Stochastic gradient descent (SGD) algorithm • 'minibatch-lbfgs' — Stochastic gradient descent with LBFGS algorithm applied to minibatches Default is 'lbfgs' for n ≤ 1000, and 'sgd' for n > 1000. Example: 'solver','minibatch-lbfgs' 32-2280

fsrnca

LossFunction — Loss function 'mad' (default) | 'mse' | 'epsiloninsensitive' | function handle Loss function, specified as the comma-separated pair consisting of 'LossFunction' and one of the following: • 'mad' — Mean absolute deviation l yi, y j = yi − y j . • 'mse' — Mean squared error 2

l yi, y j = yi − y j . • 'epsiloninsensitive' — ε-insensitive loss function l yi, y j = max 0, yi − y j − ϵ . This loss function is more robust to outliers than mean squared error or mean absolute deviation. • @lossfun — Custom loss function handle. A loss function has this form. function L = lossfun(Yu,Yv) % calculation of loss ...

Yu is a u-by-1 vector and Yv is a v-by-1 vector. L is a u-by-v matrix of loss values such that L(i,j) is the loss value for Yu(i) and Yv(j). The objective function for minimization includes the loss function l(yi,yj) as follows: f w =

n

n

p

1 ∑ pi jl yi, y j + λ ∑ wr2, ni∑ r=1 = 1 j = 1, j ≠ i

where w is the feature weight vector, n is the number of observations, and p is the number of predictor variables. pij is the probability that xj is the reference point for xi. For details, see “NCA Feature Selection for Regression” on page 15-101. Example: 'LossFunction',@lossfun Epsilon — Epsilon value iqr(Y)/13.49 (default) | nonnegative real scalar Epsilon value for the 'LossFunction','epsiloninsensitive' option, specified as the commaseparated pair consisting of 'LossFunction' and a nonnegative real scalar. The default value is an estimate of the sample standard deviation using the interquartile range of the response variable. Example: 'Epsilon',0.1 Data Types: double | single CacheSize — Memory size 1000MB (default) | integer Memory size, in MB, to use for objective function and gradient computation, specified as the commaseparated pair consisting of 'CacheSize' and an integer. Example: 'CacheSize',1500MB 32-2281

32

Functions

Data Types: double | single LBFGS Options

HessianHistorySize — Size of history buffer for Hessian approximation 15 (default) | positive integer Size of history buffer for Hessian approximation for the 'lbfgs' solver, specified as the commaseparated pair consisting of 'HessianHistorySize' and a positive integer. At each iteration the function uses the most recent HessianHistorySize iterations to build an approximation to the inverse Hessian. Example: 'HessianHistorySize',20 Data Types: double | single InitialStepSize — Initial step size 'auto' (default) | positive real scalar Initial step size for the 'lbfgs' solver, specified as the comma-separated pair consisting of 'InitialStepSize' and a positive real scalar. By default, the function determines the initial step size automatically. Data Types: double | single LineSearchMethod — Line search method 'weakwolfe' (default) | 'strongwolfe' | 'backtracking' Line search method, specified as the comma-separated pair consisting of 'LineSearchMethod' and one of the following: • 'weakwolfe' — Weak Wolfe line search • 'strongwolfe' — Strong Wolfe line search • 'backtracking' — Backtracking line search Example: 'LineSearchMethod','backtracking' MaxLineSearchIterations — Maximum number of line search iterations 20 (default) | positive integer Maximum number of line search iterations, specified as the comma-separated pair consisting of 'MaxLineSearchIterations' and a positive integer. Example: 'MaxLineSearchIterations',25 Data Types: double | single GradientTolerance — Relative convergence tolerance 1e-6 (default) | positive real scalar Relative convergence tolerance on the gradient norm for solver lbfgs, specified as the commaseparated pair consisting of 'GradientTolerance' and a positive real scalar. Example: 'GradientTolerance',0.000002 Data Types: double | single 32-2282

fsrnca

SGD Options

InitialLearningRate — Initial learning rate for 'sgd' solver 'auto' (default) | positive real scalar Initial learning rate for the 'sgd' solver, specified as the comma-separated pair consisting of 'InitialLearningRate' and a positive real scalar. When using solver type 'sgd', the learning rate decays over iterations starting with the value specified for 'InitialLearningRate'. The default 'auto' means that the initial learning rate is determined using experiments on small subsets of data. Use the NumTuningIterations name-value pair argument to specify the number of iterations for automatically tuning the initial learning rate. Use the TuningSubsetSize name-value pair argument to specify the number of observations to use for automatically tuning the initial learning rate. For solver type 'minibatch-lbfgs', you can set 'InitialLearningRate' to a very high value. In this case, the function applies LBFGS to each mini-batch separately with initial feature weights from the previous mini-batch. To make sure the chosen initial learning rate decreases the objective value with each iteration, plot the Iteration versus the Objective values saved in the mdl.FitInfo property. You can use the refit method with 'InitialFeatureWeights' equal to mdl.FeatureWeights to start from the current solution and run additional iterations Example: 'InitialLearningRate',0.9 Data Types: double | single MiniBatchSize — Number of observations to use in each batch for the 'sgd' solver min(10,n) (default) | positive integer value from 1 to n Number of observations to use in each batch for the 'sgd' solver, specified as the comma-separated pair consisting of 'MiniBatchSize' and a positive integer from 1 to n. Example: 'MiniBatchSize',25 Data Types: double | single PassLimit — Maximum number of passes for solver 'sgd' 5 (default) | positive integer Maximum number of passes through all n observations for solver 'sgd', specified as the commaseparated pair consisting of 'PassLimit' and a positive integer. Each pass through all of the data is called an epoch. Example: 'PassLimit',10 Data Types: double | single NumPrint — Frequency of batches for displaying convergence summary 10 (default) | positive integer value Frequency of batches for displaying convergence summary for the 'sgd' solver , specified as the comma-separated pair consisting of 'NumPrint' and a positive integer. This argument applies when 32-2283

32

Functions

the 'Verbose' value is greater than 0. NumPrint mini-batches are processed for each line of the convergence summary that is displayed on the command line. Example: 'NumPrint',5 Data Types: double | single NumTuningIterations — Number of tuning iterations 20 (default) | positive integer Number of tuning iterations for the 'sgd' solver, specified as the comma-separated pair consisting of 'NumTuningIterations' and a positive integer. This option is valid only for 'InitialLearningRate','auto'. Example: 'NumTuningIterations',15 Data Types: double | single TuningSubsetSize — Number of observations to use for tuning initial learning rate min(100,n) (default) | positive integer value from 1 to n Number of observations to use for tuning the initial learning rate, specified as the comma-separated pair consisting of 'TuningSubsetSize' and a positive integer value from 1 to n. This option is valid only for 'InitialLearningRate','auto'. Example: 'TuningSubsetSize',25 Data Types: double | single SGD or LBFGS Options

IterationLimit — Maximum number of iterations positive integer Maximum number of iterations, specified as the comma-separated pair consisting of 'IterationLimit' and a positive integer. The default is 10000 for SGD and 1000 for LBFGS and mini-batch LBFGS. Each pass through a batch is an iteration. Each pass through all of the data is an epoch. If the data is divided into k mini-batches, then every epoch is equivalent to k iterations. Example: 'IterationLimit',250 Data Types: double | single StepTolerance — Convergence tolerance on the step size 1e-6 (default) | positive real scalar Convergence tolerance on the step size, specified as the comma-separated pair consisting of 'StepTolerance' and a positive real scalar. The 'lbfgs' solver uses an absolute step tolerance, and the 'sgd' solver uses a relative step tolerance. Example: 'StepTolerance',0.000005 Data Types: double | single Mini-batch LBFGS Options

MiniBatchLBFGSIterations — Maximum number of iterations per mini-batch LBFGS step 10 (default) | positive integer 32-2284

fsrnca

Maximum number of iterations per mini-batch LBFGS step, specified as the comma-separated pair consisting of 'MiniBatchLBFGSIterations' and a positive integer. Example: 'MiniBatchLBFGSIterations',15 Mini-batch LBFGS algorithm is a combination of SGD and LBFGS methods. Therefore, all of the name-value pair arguments that apply to SGD and LBFGS solvers also apply to the minibatch LBFGS algorithm. Data Types: double | single

Output Arguments mdl — Neighborhood component analysis model for regression FeatureSelectionNCARegression object Neighborhood component analysis model for regression, returned as a FeatureSelectionNCARegression object.

References [1] Rasmussen, C. E., R. M. Neal, G. E. Hinton, D. van Campand, M. Revow, Z. Ghahramani, R. Kustra, R. Tibshirani. The DELVE Manual, 1996, http://mlg.eng.cam.ac.uk/pub/pdf/ RasNeaHinetal96.pdf. [2] University of Toronto, Computer Science Department. Delve Datasets. http://www.cs.toronto.edu/ ~delve/data/datasets.html. [3] Warwick J. N., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait." Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288), 1994. [4] S. Waugh. "Extending and Benchmarking Cascade-Correlation", PhD Thesis. Computer Science Department, University of Tasmania, 1995. [5] Lichman, M. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science, 2013. http://archive.ics.uci.edu/ml.

See Also FeatureSelectionNCARegression | loss | predict | refit Topics “Robust Feature Selection Using NCA for Regression” on page 15-85 “Neighborhood Component Analysis (NCA) Feature Selection” on page 15-99 “Introduction to Feature Selection” on page 15-49 Introduced in R2016b

32-2285

32

Functions

fsrftest Univariate feature ranking for regression using F-tests

Syntax idx = fsrftest(Tbl,ResponseVarName) idx = fsrftest(Tbl,formula) idx = fsrftest(Tbl,Y) idx = fsrftest(X,Y) idx = fsrftest( ___ ,Name,Value) [idx,scores] = fsrftest( ___ )

Description idx = fsrftest(Tbl,ResponseVarName) ranks features (predictors) using F-tests on page 322294. The table Tbl contains predictor variables and a response variable, and ResponseVarName is the name of the response variable in Tbl. The function returns idx, which contains the indices of predictors ordered by predictor importance, meaning idx(1) is the index of the most important predictor. You can use idx to select important predictors for regression problems. idx = fsrftest(Tbl,formula) specifies a response variable and predictor variables to consider among the variables in Tbl by using formula. idx = fsrftest(Tbl,Y) ranks predictors in Tbl using the response variable Y. idx = fsrftest(X,Y) ranks predictors in X using the response variable Y. idx = fsrftest( ___ ,Name,Value) specifies additional options using one or more name-value pair arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can specify categorical predictors and observation weights. [idx,scores] = fsrftest( ___ ) also returns the predictor scores scores. A large score value indicates that the corresponding predictor is important.

Examples Rank Predictors in Matrix Rank predictors in a numeric matrix and create a bar plot of predictor importance scores. Load the sample data. load robotarm.mat

The robotarm data set contains 7168 training observations (Xtrain and ytrain) and 1024 test observations (Xtest and ytest) with 32 features [1][2]. Rank the predictors using the training observations. 32-2286

fsrftest

[idx,scores] = fsrftest(Xtrain,ytrain);

The values in scores are the negative logs of the p-values. If a p-value is smaller than eps(0), then the corresponding score value is Inf. Before creating a bar plot, determine whether scores includes Inf values. find(isinf(scores)) ans = 1x0 empty double row vector

scores does not include Inf values. If scores includes Inf values, you can replace Inf by a large numeric number before creating a bar plot for visualization purposes. For details, see “Rank Predictors in Table” on page 32-2288. Create a bar plot of the predictor importance scores. bar(scores(idx)) xlabel('Predictor rank') ylabel('Predictor importance score')

Select the top five most important predictors. Find the columns of these predictors in Xtrain. idx(1:5) ans = 1×5

32-2287

32

Functions

30

24

10

4

5

The 30th column of Xtrain is the most important predictor of ytrain.

Rank Predictors in Table Rank predictors in a table and create a bar plot of predictor importance scores. If your data is in a table and fsrftest ranks a subset of the variables in the table, then the function indexes the variables using only the subset. Therefore, a good practice is to move the predictors that you do not want to rank to the end of the table. Move the response variable and observation weight vector as well. Then, the indexes of the output arguments are consistent with the indexes of the table. You can move variables in a table using the movevars function. This example uses the Abalone data [3][4] from the UCI Machine Learning Repository [5]. Download the data and save it in your current folder with the name 'abalone.data'. Store the data in a table. tbl = readtable('abalone.data','Filetype','text','ReadVariableNames',false); tbl.Properties.VariableNames = {'Sex','Length','Diameter','Height', ... 'WWeight','SWeight','VWeight','ShWeight','NoShellRings'};

Preview the first few rows of the table. head(tbl) ans=8×9 table Sex Length _____ ______ {'M'} {'M'} {'F'} {'M'} {'I'} {'I'} {'F'} {'F'}

Diameter ________

0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545

0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425

Height ______

WWeight _______

SWeight _______

VWeight _______

0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125

0.514 0.2255 0.677 0.516 0.205 0.3515 0.7775 0.768

0.2245 0.0995 0.2565 0.2155 0.0895 0.141 0.237 0.294

0.101 0.0485 0.1415 0.114 0.0395 0.0775 0.1415 0.1495

The last variable in the table is a response variable. Rank the predictors in tbl. Specify the last column NoShellRings as a response variable. [idx,scores] = fsrftest(tbl,'NoShellRings') idx = 1×8 3

4

scores = 1×8

32-2288

5

7

8

2

6

1

ShWeight ________ 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26

NoShell _______

15 7 9 10 7 8 20 16

fsrftest

447.6891

736.9619

Inf

Inf

Inf

604.6692

Inf

Inf

The values in scores are the negative logs of the p-values. If a p-value is smaller than eps(0), then the corresponding score value is Inf. Before creating a bar plot, determine whether scores includes Inf values. idxInf = find(isinf(scores)) idxInf = 1×5 3

4

5

7

8

scores includes five Inf values. Create a bar plot of predictor importance scores. Use the predictor names for the x-axis tick labels. bar(scores(idx)) xlabel('Predictor rank') ylabel('Predictor importance score') xticklabels(strrep(tbl.Properties.VariableNames(idx),'_','\_')) xtickangle(45)

The bar function does not plot any bars for the Inf values. For the Inf values, plot bars that have the same length as the largest finite score. hold on bar(scores(idx(length(idxInf)+1))*ones(length(idxInf),1)) legend('Finite Scores','Inf Scores') hold off

32-2289

32

Functions

The bar graph displays finite scores and Inf scores using different colors.

Input Arguments Tbl — Sample data table Sample data, specified as a table. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. Each row of Tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, Tbl can contain additional columns for a response variable and observation weights. A response variable can be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element of the response variable must correspond to one row of the array. • If Tbl contains the response variable, and you want to use all remaining variables in Tbl as predictors, then specify the response variable by using ResponseVarName. If Tbl also contains the observation weights, then you can specify the weights by using Weights. • If Tbl contains the response variable, and you want to use only a subset of the remaining variables in Tbl as predictors, then specify the subset of variables by using formula. 32-2290

fsrftest

• If Tbl does not contain the response variable, then specify a response variable by using Y. The response variable and Tbl must have the same number of rows. If fsrftest uses a subset of variables in Tbl as predictors, then the function indexes the predictors using only the subset. The values in the 'CategoricalPredictors' name-value pair argument and the output argument idx do not count the predictors that the function does not rank. fsrftest considers NaN, '' (empty character vector), "" (empty string), , and values in Tbl for a response variable to be missing values. fsrftest does not use observations with missing values for a response variable. Data Types: table ResponseVarName — Response variable name character vector or string scalar containing name of variable in Tbl Response variable name, specified as a character vector or string scalar containing the name of a variable in Tbl. For example, if a response variable is the column Y of Tbl (Tbl.Y), then specify ResponseVarName as 'Y'. Data Types: char | string formula — Explanatory model of response variable and subset of predictor variables character vector | string scalar Explanatory model of the response variable and a subset of the predictor variables, specified as a character vector or string scalar in the form 'Y ~ X1 + X2 + X3'. In this form, Y represents the response variable, and X1, X2, and X3 represent the predictor variables. To specify a subset of variables in Tbl as predictors, use a formula. If you specify a formula, then fsrftest does not rank any variables in Tbl that do not appear in formula. The variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. For details, see “Tips” on page 32-2294. Data Types: char | string Y — Response variable numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors Response variable, specified as a numeric, categorical, or logical vector, a character or string array, or a cell array of character vectors. Each row of Y represents the labels of the corresponding row of X. fsrftest considers NaN, '' (empty character vector), "" (empty string), , and values in Y to be missing values. fsrftest does not use observations with missing values for Y. Data Types: single | double | categorical | logical | char | string | cell X — Predictor data numeric matrix 32-2291

32

Functions

Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one predictor variable. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'NumBins',20,'UseMissing',true sets the number of bins as 20 and specifies to use missing values in predictors for ranking. CategoricalPredictors — List of categorical predictors vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all' List of categorical predictors, specified as the comma-separated pair consisting of 'CategoricalPredictors' and one of the values in this table. Value

Description

Vector of positive integers

Each entry in the vector is an index value corresponding to the column of the predictor data (X or Tbl) that contains a categorical variable.

Logical vector

A true entry means that the corresponding column of predictor data (X or Tbl) is a categorical variable.

Character matrix

Each row of the matrix is the name of a predictor variable. The names must match the names in Tbl. Pad the names with extra blanks so each row of the character matrix has the same length.

String array or cell array of character vectors

Each element in the array is the name of a predictor variable. The names must match the names in Tbl.

'all'

All predictors are categorical.

By default, if the predictor data is in a table (Tbl), fsrftest assumes that a variable is categorical if it is a logical vector, unordered categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix (X), fsrftest assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the 'CategoricalPredictors' name-value pair argument. If fsrftest uses a subset of variables in Tbl as predictors, then the function indexes the predictors using only the subset. The 'CategoricalPredictors' values do not count the predictors that the function does not rank. Example: 'CategoricalPredictors','all' Data Types: single | double | logical | char | string | cell NumBins — Number of bins for binning continuous predictors 10 (default) | positive integer scalar Number of bins for binning continuous predictors, specified as the comma-separated pair consisting of 'NumBins' and a positive integer scalar. 32-2292

fsrftest

Example: 'NumBins',50 Data Types: single | double UseMissing — Indicator for whether to use or discard missing values in predictors false (default) | true Indicator for whether to use or discard missing values in predictors, specified as the commaseparated pair consisting of 'UseMissing' and either true to use or false to discard missing values in predictors for ranking. fsrftest considers NaN, '' (empty character vector), "" (empty string), , and values to be missing values. If you specify 'UseMissing',true, then fsrftest uses missing values for ranking. For a categorical variable, fsrftest treats missing values as an extra category. For a continuous variable, fsrftest places NaN values in a separate bin for binning. If you specify 'UseMissing',false, then fsrftest does not use missing values for ranking. Because fsrftest computes importance scores individually for each predictor, the function does not discard an entire row when values in the row are partially missing. For each variable, fsrftest uses all values that are not missing. Example: 'UseMissing',true Data Types: logical Weights — Observation weights ones(size(X,1),1) (default) | vector of scalar values | name of variable in Tbl Observation weights, specified as the comma-separated pair consisting of 'Weights' and a vector of scalar values or the name of a variable in Tbl. The function weights the observations in each row of X or Tbl with the corresponding value in Weights. The size of Weights must equal the number of rows in X or Tbl. If you specify the input data as a table Tbl, then Weights can be the name of a variable in Tbl that contains a numeric vector. In this case, you must specify Weights as a character vector or string scalar. For example, if the weight vector is the column W of Tbl (Tbl.W), then specify 'Weights','W'. fsrftest normalizes the weights to add up to one. Data Types: single | double | char | string

Output Arguments idx — Indices of predictors ordered by predictor importance numeric vector Indices of predictors in X or Tbl ordered by predictor importance, returned as a 1-by-r numeric vector, where r is the number of ranked predictors. If fsrftest uses a subset of variables in Tbl as predictors, then the function indexes the predictors using only the subset. For example, suppose Tbl includes 10 columns and you specify the last five columns of Tbl as the predictor variables by using formula. If idx(3) is 5, then the third most important predictor is the 10th column in Tbl, which is the fifth predictor in the subset. 32-2293

32

Functions

scores — Predictor scores numeric vector Predictor scores, returned as a 1-by-r numeric vector, where r is the number of ranked predictors. A large score value indicates that the corresponding predictor is important. • If you use X to specify the predictors or use all the variables in Tbl as predictors, then the values in scores have the same order as the predictors in X or Tbl. • If you specify a subset of variables in Tbl as predictors, then the values in scores have the same order as the subset. For example, suppose Tbl includes 10 columns and you specify the last five columns of Tbl as the predictor variables by using formula. Then, score(3) contains the score value of the 8th column in Tbl, which is the third predictor in the subset.

Tips • If you specify the response variable and predictor variables by using the input argument formula, then the variable names in the formula must be both variable names in Tbl (Tbl.Properties.VariableNames) and valid MATLAB identifiers. You can verify the variable names in Tbl by using the isvarname function. The following code returns logical 1 (true) for each variable that has a valid variable name. cellfun(@isvarname,Tbl.Properties.VariableNames)

If the variable names in Tbl are not valid, then convert them by using the matlab.lang.makeValidName function. Tbl.Properties.VariableNames = matlab.lang.makeValidName(Tbl.Properties.VariableNames);

Algorithms Univariate Feature Ranking Using F-Tests • fsrftest examines the importance of each predictor individually using an F-test. Each F-test tests the hypothesis that the response values grouped by predictor variable values are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same. A small p-value of the test statistic indicates that the corresponding predictor is important. • The output scores is –log(p). Therefore, a large score value indicates that the corresponding predictor is important. If a p-value is smaller than eps(0), then the output is Inf. • fsrftest examines a continuous variable after binning, or discretizing, the variable. You can specify the number of bins using the 'NumBins' name-value pair argument.

References [1] Rasmussen, C. E., R. M. Neal, G. E. Hinton, D. van Camp, M. Revow, Z. Ghahramani, R. Kustra, and R. Tibshirani. The DELVE Manual, 1996. [2] University of Toronto, Computer Science Department. Delve Datasets. 32-2294

fsrftest

[3] Nash, Warwick J., ed. The Population Biology of Abalone (Haliotis Species) in Tasmania. 1: Blacklip Abalone (H. Rubra) from the North Coast and the Islands of Bass Strait. Technical Report/ Department of Sea Fisheries, Tasmania 48. Taroona: Marine Research Laboratories, 1994. [4] Waugh, S. "Extending and Benchmarking Cascade-Correlation." PhD Thesis. Computer Science Department, University of Tasmania, 1995. [5] Lichman, M. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science, 2013. http://archive.ics.uci.edu/ml.

See Also fsrnca | relieff | sequentialfs Topics “Introduction to Feature Selection” on page 15-49 Introduced in R2020a

32-2295

32

Functions

fstat F mean and variance

Syntax [M,V] = fstat(V1,V2)

Description [M,V] = fstat(V1,V2) returns the mean of and variance for the F distribution with numerator degrees of freedom V1 and denominator degrees of freedom V2. V1 and V2 can be vectors, matrices, or multidimensional arrays that all have the same size, which is also the size of M and V. A scalar input for V1 or V2 is expanded to a constant arrays with the same dimensions as the other input. V1 and V2 parameters must contain real positive values. The mean of the F distribution for values of ν2 greater than 2 is ν2 ν2 − 2 The variance of the F distribution for values of ν2 greater than 4 is 2ν22(ν1 + ν2 − 2) 2

ν1(ν2 − 2) (ν2 − 4) The mean of the F distribution is undefined if ν2 is less than 3. The variance is undefined for ν2 less than 5.

Examples fstat returns NaN when the mean and variance are undefined. [m,v] = fstat(1:5,1:5) m = NaN NaN 3.0000 2.0000 1.6667 v = NaN NaN NaN NaN 8.8889

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 32-2296

fstat

See Also fcdf | finv | fpdf | frnd Topics “F Distribution” on page B-45 Introduced before R2006a

32-2297

32

Functions

fsulaplacian Rank features for unsupervised learning using Laplacian scores

Syntax idx = fsulaplacian(X) idx = fsulaplacian(X,Name,Value) [idx,scores] = fsulaplacian( ___ )

Description idx = fsulaplacian(X) ranks features (variables) in X using the Laplacian scores on page 322304. The function returns idx, which contains the indices of features ordered by feature importance. You can use idx to select important features for unsupervised learning. idx = fsulaplacian(X,Name,Value) specifies additional options using one or more name-value pair arguments. For example, you can specify 'NumNeighbors',10 to create a similarity graph on page 32-2303 using 10 nearest neighbors. [idx,scores] = fsulaplacian( ___ ) also returns the feature scores scores, using any of the input argument combinations in the previous syntaxes. A large score value indicates that the corresponding feature is important.

Examples Rank Features by Importance Load the sample data. load ionosphere

Rank the features based on importance. [idx,scores] = fsulaplacian(X);

Create a bar plot of the feature importance scores. bar(scores(idx)) xlabel('Feature rank') ylabel('Feature importance score')

32-2298

fsulaplacian

Select the top five most important features. Find the columns of these features in X. idx(1:5) ans = 1×5 15

13

17

21

19

The 15th column of X is the most important feature.

Rank Features Using Specified Similarity Matrix Compute a similarity matrix from Fisher's iris data set and rank the features using the similarity matrix. Load Fisher's iris data set. load fisheriris

Find the distance between each pair of observations in meas by using the pdist and squareform functions with the default Euclidean distance metric. D = pdist(meas); Z = squareform(D);

32-2299

32

Functions

Construct the similarity matrix and confirm that it is symmetric. S = exp(-Z.^2); issymmetric(S) ans = logical 1

Rank the features. idx = fsulaplacian(meas,'Similarity',S) idx = 1×4 3

4

1

2

Ranking using the similarity matrix S is the same as ranking by specifying 'NumNeighbors' as size(meas,1). idx2 = fsulaplacian(meas,'NumNeighbors',size(meas,1)) idx2 = 1×4 3

4

1

2

Input Arguments X — Input data numeric matrix Input data, specified as an n-by-p numeric matrix. The rows of X correspond to observations (or points), and the columns correspond to features. The software treats NaNs in X as missing data and ignores any row of X containing at least one NaN. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'NumNeighbors',10,'KernelScale','auto' specifies the number of nearest neighbors as 10 and the kernel scale factor as 'auto'. Similarity — Similarity matrix [] (empty matrix) (default) | symmetric matrix Similarity matrix, specified as the comma-separated pair consisting of 'Similarity' and an n-by-n symmetric matrix, where n is the number of observations. The similarity matrix (or adjacency matrix) represents the input data by modeling local neighborhood relationships among the data points. The values in a similarity matrix represent the edges (or connections) between nodes (data points) that 32-2300

fsulaplacian

are connected in a similarity graph on page 32-2303. For more information, see “Similarity Matrix” on page 32-2304. If you specify the 'Similarity' value, then you cannot specify any other name-value pair argument. If you do not specify the 'Similarity' value, then the software computes a similarity matrix using the options specified by the other name-value pair arguments. Data Types: single | double Distance — Distance metric character vector | string scalar | function handle Distance metric, specified as the comma-separated pair consisting of 'Distance' and a character vector, string scalar, or function handle, as described in this table. Value

Description

'euclidean'

Euclidean distance (default)

'seuclidean'

Standardized Euclidean distance. Each coordinate difference between observations is scaled by dividing by the corresponding element of the standard deviation computed from X. Use the Scale name-value pair argument to specify a different scaling factor.

'mahalanobis'

Mahalanobis distance using the sample covariance of X, C = nancov(X). Use the Cov name-value pair argument to specify a different covariance matrix.

'cityblock'

City block distance

'minkowski'

Minkowski distance. The default exponent is 2. Use the P name-value pair argument to specify a different exponent, where P is a positive scalar value.

'chebychev'

Chebychev distance (maximum coordinate difference)

'cosine'

One minus the cosine of the included angle between observations (treated as vectors)

'correlation'

One minus the sample correlation between observations (treated as sequences of values)

'hamming'

Hamming distance, which is the percentage of coordinates that differ

'jaccard'

One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ

'spearman'

One minus the sample Spearman's rank correlation between observations (treated as sequences of values)

32-2301

32

Functions

Value

Description

@distfun

Custom distance function handle. A distance function has the form function D2 = distfun(ZI,ZJ) % calculation of distance ...

where • ZI is a 1-by-n vector containing a single observation. • ZJ is an m2-by-n matrix containing multiple observations. distfun must accept a matrix ZJ with an arbitrary number of observations. • D2 is an m2-by-1 vector of distances, and D2(k) is the distance between observations ZI and ZJ(k,:). If your data is not sparse, you can generally compute distance more quickly by using a built-in distance instead of a function handle. For more information, see “Distance Metrics” on page 18-11. When you use the 'seuclidean', 'minkowski', or 'mahalanobis' distance metric, you can specify the additional name-value pair argument 'Scale', 'P', or 'Cov', respectively, to control the distance metrics. Example: 'Distance','minkowski','P',3 specifies to use the Minkowski distance metric with an exponent of 3. P — Exponent for Minkowski distance metric 2 (default) | positive scalar Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar. This argument is valid only if 'Distance' is 'minkowski'. Example: 'P',3 Data Types: single | double Cov — Covariance matrix for Mahalanobis distance metric nancov(X) (default) | positive definite matrix Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair consisting of 'Cov' and a positive definite matrix. This argument is valid only if 'Distance' is 'mahalanobis'. Example: 'Cov',eye(4) Data Types: single | double Scale — Scaling factors for standardized Euclidean distance metric nanstd(X) (default) | numeric vector of nonnegative values Scaling factors for the standardized Euclidean distance metric, specified as the comma-separated pair consisting of 'Scale' and a numeric vector of nonnegative values. 32-2302

fsulaplacian

Scale has length p (the number of columns in X), because each dimension (column) of X has a corresponding value in Scale. For each dimension of X, fsulaplacian uses the corresponding value in Scale to standardize the difference between observations. This argument is valid only if 'Distance' is 'seuclidean'. Data Types: single | double NumNeighbors — Number of nearest neighbors log(size(X,1)) (default) | positive integer Number of nearest neighbors used to construct the similarity graph, specified as the commaseparated pair consisting of 'NumNeighbors' and a positive integer. Example: 'NumNeighbors',10 Data Types: single | double KernelScale — Scale factor 1 (default) | 'auto' | positive scalar Scale factor for the kernel, specified as the comma-separated pair consisting of 'KernelScale' and 'auto' or a positive scalar. The software uses the scale factor to transform distances to similarity measures. For more information, see “Similarity Graph” on page 32-2303. • The 'auto' option is supported only for the 'euclidean' and 'seuclidean' distance metrics. • If you specify 'auto', then the software selects an appropriate scale factor using a heuristic procedure. This heuristic procedure uses subsampling, so estimates can vary from one call to another. To reproduce results, set a random number seed using rng before calling fsulaplacian. Example: 'KernelScale','auto'

Output Arguments idx — Indices of features ordered by feature importance numeric vector Indices of the features in X ordered by feature importance, returned as a numeric vector. For example, if idx(3) is 5, then the third most important feature is the fifth column in X. scores — Feature scores numeric vector Feature scores, returned as a numeric vector. A large score value in scores indicates that the corresponding feature is important. The values in scores have the same order as the features in X.

More About Similarity Graph A similarity graph models the local neighborhood relationships between data points in X as an undirected graph. The nodes in the graph represent data points, and the edges, which are directionless, represent the connections between the data points. 32-2303

32

Functions

If the pairwise distance Disti,j between any two nodes i and j is positive (or larger than a certain threshold), then the similarity graph connects the two nodes using an edge [2]. The edge between the Disti, j 2 two nodes is weighted by the pairwise similarity Si,j, where Si, j = exp − , for a specified σ kernel scale σ value. fsulaplacian constructs a similarity graph using the nearest neighbor method. The function connects points in X that are nearest neighbors. Use 'NumNeighbors' to specify the number of nearest neighbors. Similarity Matrix A similarity matrix is a matrix representation of a similarity graph on page 32-2303. The n-by-n matrix S = (Si, j)i, j = 1, …, n contains pairwise similarity values between connected nodes in the similarity graph. The similarity matrix of a graph is also called an adjacency matrix. The similarity matrix is symmetric because the edges of the similarity graph are directionless. A value of Si,j = 0 means that nodes i and j of the similarity graph are not connected. Degree Matrix A degree matrix Dg is an n-by-n diagonal matrix obtained by summing the rows of the similarity matrix on page 32-2304 S. That is, the ith diagonal element of Dg is Dg(i, i) =

n



j=1

Si, j .

Laplacian Matrix A Laplacian matrix, which is one way of representing a similarity graph on page 32-2303, is defined as the difference between the degree matrix on page 32-2304 Dg and the similarity matrix on page 32-2304 S. L = Dg − S .

Algorithms Laplacian Score The fsulaplacian function ranks features using Laplacian scores[1] obtained from a nearest neighbor similarity graph on page 32-2303. fsulaplacian computes the values in scores as follows: 1

For each data point in X, define a local neighborhood using the nearest neighbor method, and find pairwise distances Disti, j for all points i and j in the neighborhood.

2

Convert the distances to the similarity matrix on page 32-2304 S using the kernel transformation Disti, j 2 , where σ is the scale factor for the kernel as specified by the Si, j = exp − σ 'KernelScale' name-value pair argument.

3

Center each feature by removing its mean. x r = xr −

32-2304

xrT Dg1 T

1 Dg1

1,

fsulaplacian

T

T

where xr is the rth feature, Dg is the degree matrix on page 32-2304, and 1 = [1, ⋯, 1] . 4

Compute the score sr for each feature. T

sr =

x r Sx r T

x r Dgx r

.

Note that [1] defines the Laplacian score as T

Lr =

x r Lx r T

x r Dgx r

T

=1−

x r Sx r T

x r Dgx r

,

where L is the Laplacian matrix on page 32-2304, defined as the difference between Dg and S. The fsulaplacian function uses only the second term of this equation for the score value of scores so that a large score value indicates an important feature. Selecting features using the Laplacian score is consistent with minimizing the value

∑i, j

2

xir − x jr Si, j , Var(xr )

where xir represents the ith observation of the rth feature. Minimizing this value implies that the algorithm prefers features with large variance. Also, the algorithm assumes that two data points of an important feature are close if and only if the similarity graph has an edge between the two data points.

References [1] He, X., D. Cai, and P. Niyogi. "Laplacian Score for Feature Selection." NIPS Proceedings. 2005. [2] Von Luxburg, U. “A Tutorial on Spectral Clustering.” Statistics and Computing Journal. Vol.17, Number 4, 2007, pp. 395–416.

See Also fscmrmr | relieff | sequentialfs Topics “Introduction to Feature Selection” on page 15-49 Introduced in R2019b

32-2305

32

Functions

fsurfht Interactive contour plot

Syntax fsurfht(fun,xlims,ylims) fsurfht(fun,xlims,ylims,p1,p2,p3,p4,p5)

Description fsurfht(fun,xlims,ylims) is an interactive contour plot of the function specified by the text variable fun. The x-axis limits are specified by xlims in the form [xmin xmax], and the y-axis limits are specified by ylims in the form [ymin ymax]. fsurfht(fun,xlims,ylims,p1,p2,p3,p4,p5) allows for five optional parameters that you can supply to the function fun. The intersection of the vertical and horizontal reference lines on the plot defines the current x value and y value. You can drag these reference lines and watch the calculated z-values (at the top of the plot) update simultaneously. Alternatively, you can type the x value and y value into editable text fields on the x-axis and y-axis.

Examples Plot the Gaussian likelihood function for the gas.mat data. load gas

Create a function containing the following commands, and name it gauslike.m. function z = gauslike(mu,sigma,p1) n = length(p1); z = ones(size(mu)); for i = 1:n z = z .* (normpdf(p1(i),mu,sigma)); end

The gauslike function calls normpdf, treating the data sample as fixed and the parameters µ and σ as variables. Assume that the gas prices are normally distributed, and plot the likelihood surface of the sample. fsurfht('gauslike',[112 118],[3 5],price1)

32-2306

fsurfht

The sample mean is the x value at the maximum, but the sample standard deviation is not the y value at the maximum. mumax = mean(price1) mumax = 115.1500 sigmamax = std(price1)*sqrt(19/20) sigmamax = 3.7719

Introduced before R2006a

32-2307

32

Functions

fullfact Full factorial design

Syntax dFF = fullfact(levels)

Description dFF = fullfact(levels) gives factor settings dFF for a full factorial design with n factors, where the number of levels for each factor is given by the vector levels of length n. dFF is m-by-n, where m is the number of treatments in the full-factorial design. Each row of dFF corresponds to a single treatment. Each column contains the settings for a single factor, with integer values from one to the number of levels.

Examples The following generates an eight-run full-factorial design with two levels in the first factor and four levels in the second factor: dFF = fullfact([2 4]) dFF = 1 1 2 1 1 2 2 2 1 3 2 3 1 4 2 4

See Also ff2n Introduced before R2006a

32-2308

gagerr

gagerr Gage repeatability and reproducibility study

Syntax gagerr(y,{part,operator}) gagerr(y,GROUP) gagerr(y,part) gagerr(...,param1,val1,param2,val2,...) [TABLE, stats] = gagerr(...)

Description gagerr(y,{part,operator}) performs a gage repeatability and reproducibility study on measurements in y collected by operator on part. y is a column vector containing the measurements on different parts. part and operator are categorical variables, numeric vectors, character matrices, string arrays, or cell arrays of character vectors. The number of elements in part and operator should be the same as in y. gagerr prints a table in the command window in which the decomposition of variance, standard deviation, study var (5.15 x standard deviation) are listed with respective percentages for different sources. Summary statistics are printed below the table giving the number of distinct categories (NDC) and the percentage of Gage R&R of total variations (PRR). gagerr also plots a bar graph showing the percentage of different components of variations. Gage R&R, repeatability, reproducibility, and part-to-part variations are plotted as four vertical bars. Variance and study var are plotted as two groups. To determine the capability of a measurement system using NDC, use the following guidelines: • If NDC > 5, the measurement system is capable. • If NDC < 2, the measurement system is not capable. • Otherwise, the measurement system may be acceptable. To determine the capability of a measurement system using PRR, use the following guidelines: • If PRR < 10%, the measurement system is capable. • If PRR > 30%, the measurement system is not capable. • Otherwise, the measurement system may be acceptable. gagerr(y,GROUP) performs a gage R&R study on measurements in y with part and operator represented in GROUP. GROUP is a numeric matrix whose first and second columns specify different parts and operators, respectively. The number of rows in GROUP should be the same as the number of elements in y. gagerr(y,part) performs a gage R&R study on measurements in y without operator information. The assumption is that all variability is contributed by part. gagerr(...,param1,val1,param2,val2,...) performs a gage R&R study using one or more of the following parameter name/value pairs: 32-2309

32

Functions

• 'spec' — A two-element vector that defines the lower and upper limit of the process, respectively. In this case, summary statistics printed in the command window include Precision-toTolerance Ratio (PTR). Also, the bar graph includes an additional group, the percentage of tolerance. To determine the capability of a measurement system using PTR, use the following guidelines: • If PTR < 0.1, the measurement system is capable. • If PTR > 0.3, the measurement system is not capable. • Otherwise, the measurement system may be acceptable. • 'printtable' — A value 'on' or 'off' that indicates whether the tabular output should be printed in the command window or not. The default value is 'on'. • 'printgraph' — A value 'on' or 'off' that indicates whether the bar graph should be plotted or not. The default value is 'on'. • 'randomoperator' — A logical value, true or false, that indicates whether the effect of operator is random or not. The default value is true. • 'model' — The model to use, specified by one of: • 'linear' — Main effects only (default) • 'interaction' — Main effects plus two-factor interactions • 'nested' — Nest operator in part The default value is 'linear'. [TABLE, stats] = gagerr(...) returns a 6-by-5 matrix TABLE and a structure stats. The columns of TABLE, from left to right, represent variance, percentage of variance, standard deviations, study var, and percentage of study var. The rows of TABLE, from top to bottom, represent different sources of variations: gage R&R, repeatability, reproducibility, operator, operator and part interactions, and part. stats is a structure containing summary statistics for the performance of the measurement system. The fields of stats are: • ndc — Number of distinct categories • prr — Percentage of gage R&R of total variations • ptr — Precision-to-tolerance ratio. The value is NaN if the parameter 'spec' is not given.

Examples Gage R&R Study Simulate a measurement system by randomly generating the operators, parts, and the measurements, y , operators do on the parts. rng(1234,'twister') y = randn(100,1); part = ceil(3*rand(100,1)); operator = ceil(4*rand(100,1));

% % % %

for reproducibility measurements parts operators

Conduct a gage R&R study for this system using a mixed ANOVA model without interactions. gagerr(y,{part, operator},'randomoperator',true)

32-2310

gagerr

Columns 1 through 4 {'Source' } {'Gage R&R' } {' Repeatability' } {' Reproducibility'} {' Operator' } {'Part' } {'Total' }

{'Variance'} {[ 0.9715]} {[ 0.9535]} {[ 0.0181]} {[ 0.0181]} {[ 0.0072]} {[ 0.9787]}

{'% Variance'} {[ 99.2653]} {[ 97.4201]} {[ 1.8452]} {[ 1.8452]} {[ 0.7347]} {[ 100]}

{'sigma' } {[0.9857]} {[0.9765]} {[0.1344]} {[0.1344]} {[0.0848]} {[0.9893]}

Columns 5 through 6 {'5.15*sigma'} {[ 5.0762]} {[ 5.0288]} {[ 0.6921]} {[ 0.6921]} {[ 0.4367]} {[ 5.0949]}

{'% 5.15*sigma'} {[ 99.6320]} {[ 98.7016]} {[ 13.5838]} {[ 13.5838]} {[ 8.5716]} {0x0 char }

Number of distinct categories (NDC):0 % of Gage R&R of total variations (PRR): 99.63 Note: The last column of the above table does not have to sum to 100%

32-2311

32

Functions

See Also Topics “Grouping Variables” on page 2-45 Introduced in R2006b

32-2312

gamcdf

gamcdf Gamma cumulative distribution function

Syntax p = gamcdf(x,a) p = gamcdf(x,a,b) [p,pLo,pUp] = gamcdf(x,a,b,pCov) [p,pLo,pUp] = gamcdf(x,a,b,pCov,alpha) ___ = gamcdf( ___ ,'upper')

Description p = gamcdf(x,a) returns the cumulative distribution function (cdf) of the standard gamma distribution with the shape parameters in a, evaluated at the values in x. p = gamcdf(x,a,b) returns the cdf of the gamma distribution with the shape parameters in a and scale parameters in b, evaluated at the values in x. [p,pLo,pUp] = gamcdf(x,a,b,pCov) also returns the 95% confidence interval [pLo,pUp] of p when a and b are estimates. pCov is the covariance matrix of the estimated parameters. [p,pLo,pUp] = gamcdf(x,a,b,pCov,alpha) specifies the confidence level for the confidence interval [pLo pUp] to be 100(1–alpha)%. ___ = gamcdf( ___ ,'upper') returns the complement of the cdf, evaluated at the values in x, using an algorithm that more accurately computes the extreme upper-tail probabilities than subtracting the lower tail value from 1. 'upper' can follow any of the input argument combinations in the previous syntaxes.

Examples Compute Gamma Distribution cdf Compute the cdf of the mean of the gamma distribution, which is equal to the product of the parameters ab. a = 1:6; b = 5:10; prob = gamcdf(a.*b,a,b) prob = 1×6 0.6321

0.5940

0.5768

0.5665

0.5595

0.5543

As ab increases, the distribution becomes more symmetric, and the mean approaches the median.

32-2313

32

Functions

Confidence Interval of Gamma cdf Value Find a confidence interval estimating the probability that an observation is in the interval [0 10] using gamma distributed data. Generate a sample of 1000 gamma distributed random numbers with shape 2 and scale 5. x = gamrnd(2,5,1000,1);

Compute estimates for the parameters. [params,~] = gamfit(x) params = 1×2 2.1089

4.8147

Store the parameters as ahat and bhat. ahat = params(1); bhat = params(2);

Find the covariance of the parameter estimates. [~,nCov] = gamlike(params,x) nCov = 2×2 0.0077 -0.0176

-0.0176 0.0512

Create a confidence interval estimating the probability that an observation is in the interval [0 10]. [prob,pLo,pUp] = gamcdf(10,ahat,bhat,nCov) prob = 0.5830 pLo = 0.5587 pUp = 0.6069

Complementary cdf (Tail Distribution) Determine the probability that an observation from the gamma distribution with shape parameter 2 and scale parameter 3 will is in the interval [150 Inf]. p1 = 1 - gamcdf(150,2,3) p1 = 0

gamcdf(150, 2, 3) is nearly 1, so p1 becomes 0. Specify 'upper' so that gamcdf computes the extreme upper-tail probabilities more accurately. 32-2314

gamcdf

p2 = gamcdf(150,2,3,'upper') p2 = 9.8366e-21

Input Arguments x — Values at which to evaluate cdf nonnegative scalar value | array of nonnegative scalar values Values at which to evaluate the cdf, specified as a nonnegative scalar value or an array of nonnegative scalar values. If you specify pCov to compute the confidence interval [pLo,pUp], then x must be a scalar value. • To evaluate the cdf at multiple values, specify x using an array. • To evaluate the cdfs of multiple distributions, specify a and b using arrays. If one or more of the input arguments x, a, and b are arrays, then the array sizes must be the same. In this case, gamcdf expands each scalar input into a constant array of the same size as the array inputs. Each element in p is the cdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in x. Example: [3 4 7 9] Data Types: single | double a — Shape parameter positive scalar value | array of positive scalar values Shape of the gamma distribution, specified as a positive scalar value or an array of positive scalar values. • To evaluate the cdf at multiple values, specify x using an array. • To evaluate the cdfs of multiple distributions, specify a and b using arrays. If one or more of the input arguments x, a, and b are arrays, then the array sizes must be the same. In this case, gamcdf expands each scalar input into a constant array of the same size as the array inputs. Each element in p is the cdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in x. Example: [1 2 3 5] Data Types: single | double b — Scale parameter 1 (default) | positive scalar value | array of positive scalar values Scale of the gamma distribution, specified as a positive scalar value or an array of positive scalar values. • To evaluate the cdf at multiple values, specify x using an array. • To evaluate the cdfs of multiple distributions, specify a and b using arrays. If one or more of the input arguments x, a, and b are arrays, then the array sizes must be the same. In this case, gamcdf expands each scalar input into a constant array of the same size as the array 32-2315

32

Functions

inputs. Each element in p is the cdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in x. Example: [1 1 2 2] Data Types: single | double pCov — Covariance of estimates 2-by-2 numeric matrix Covariance of the estimates a and b, specified as a 2-by-2 matrix. If you specify pCov to compute the confidence interval [pLo,pUp], then x, a, and b must be scalar values. You can estimate a and b by using gamfit or mle, and estimate the covariance of a and b by using gamlike. For an example, see “Confidence Interval of Gamma cdf Value” on page 32-2314. Data Types: single | double alpha — Significance level 0.05 (default) | scalar in the range (0,1) Significance level for the confidence interval, specified as a scalar in the range (0,1). The confidence level is 100(1–alpha)%, where alpha is the probability that the confidence interval does not contain the true value. Example: 0.01 Data Types: single | double

Output Arguments p — cdf values scalar value | array of scalar values cdf values evaluated at the values in x, returned as a scalar value or an array of scalar values. p is the same size as x, a, and b after any necessary scalar expansion. Each element in p is the cdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in x. pLo — Lower confidence bound for p scalar value | array of scalar values Lower confidence bound for p, returned as a scalar value or an array of scalar values. pLo has the same size as p. pUp — Upper confidence bound for p scalar value | array of scalar values Upper confidence bound for p, returned as a scalar value or an array of scalar values. pUp has the same size as p.

32-2316

gamcdf

More About Gamma cdf The gamma distribution is a two-parameter family of curves. The parameters a and b are shape and scale, respectively. The gamma cdf is p = F(x a, b) =

x

∫t b Γ(a) 1

a

a − 1 −t e b dt .

0

The result p is the probability that a single observation from a gamma distribution with parameters a and b falls in the interval [0,x]. The gamma cdf is related to the incomplete gamma function gamfit by f x a, b = gammainc

x ,a . b

The standard gamma distribution occurs when b = 1, which coincides with the incomplete gamma function precisely. For more information, see “Gamma Distribution” on page B-47.

Alternative Functionality • gamcdf is a function specific to the gamma distribution. Statistics and Machine Learning Toolbox also offers the generic function cdf, which supports various probability distributions. To use cdf, create a GammaDistribution probability distribution object and pass the object as an input argument or specify the probability distribution name and its parameters. Note that the distribution-specific function gamcdf is faster than the generic function cdf. • Use the Probability Distribution Function app to create an interactive plot of the cumulative distribution function (cdf) or probability density function (pdf) for a probability distribution.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also GammaDistribution | cdf | gamfit | gaminv | gamlike | gamma | gampdf | gamrnd | gamstat Topics “Gamma Distribution” on page B-47 32-2317

32

Functions

Introduced before R2006a

32-2318

gamfit

gamfit Gamma parameter estimates

Syntax phat = gamfit(data) [phat,pci] = gamfit(data) [phat,pci] = gamfit(data,alpha) [...] = gamfit(data,alpha,censoring,freq,options)

Description phat = gamfit(data) returns the maximum likelihood estimates (MLEs) for the parameters of the gamma distribution given the data in vector data. [phat,pci] = gamfit(data) returns MLEs and 95% percent confidence intervals. The first row of pci is the lower bound of the confidence intervals; the last row is the upper bound. [phat,pci] = gamfit(data,alpha) returns 100(1 - alpha)% confidence intervals. For example, alpha = 0.01 yields 99% confidence intervals. [...] = gamfit(data,alpha,censoring) accepts a Boolean vector of the same size as data that is 1 for observations that are right-censored and 0 for observations that are observed exactly. [...] = gamfit(data,alpha,censoring,freq) accepts a frequency vector of the same size as data. freq typically contains integer frequencies for the corresponding elements in data, but may contain any nonnegative values. [...] = gamfit(data,alpha,censoring,freq,options) accepts a structure, options, that specifies control parameters for the iterative algorithm the function uses to compute maximum likelihood estimates. The gamma fit function accepts an options structure which can be created using the function statset. Enter statset('gamfit') to see the names and default values of the parameters that gamfit accepts in the options structure.

Examples Fit a gamma distribution to random data generated from a specified gamma distribution: a = 2; b = 4; data = gamrnd(a,b,100,1); [p,ci] = gamfit(data) p = 2.1990 3.7426 ci = 1.6840 2.8298 2.7141 4.6554

32-2319

32

Functions

References [1] Hahn, Gerald J., and S. S. Shapiro. Statistical Models in Engineering. Hoboken, NJ: John Wiley & Sons, Inc., 1994, p. 88.

See Also gamcdf | gaminv | gamlike | gampdf | gamrnd | gamstat | mle Topics “Gamma Distribution” on page B-47 Introduced before R2006a

32-2320

gaminv

gaminv Gamma inverse cumulative distribution function

Syntax x = gaminv(p,a) x = gaminv(p,a,b) [x,xLo,xUp] = gaminv(p,a,b,pCov) [x,xLo,xUp] = gaminv(p,a,b,pCov,alpha)

Description x = gaminv(p,a) returns the inverse cumulative distribution function (icdf) of the standard gamma distribution with the shape parameter a, evaluated at the values in p. x = gaminv(p,a,b) returns the icdf of the gamma distribution with shape parameter a and the scale parameter b, evaluated at the values in p. [x,xLo,xUp] = gaminv(p,a,b,pCov) also returns the 95% confidence interval [xLo,xUp] of x when a and b are estimates. pCov is the covariance matrix of the estimated parameters. [x,xLo,xUp] = gaminv(p,a,b,pCov,alpha) specifies the confidence level for the confidence interval [xLo,xUp] to be 100(1–alpha)%.

Examples Compute Gamma icdf Find the median of the gamma distribution with shape parameter 3 and scale parameter 5. x = gaminv(0.5,3,5) x = 13.3703

Confidence Interval of Gamma icdf Value Find a confidence interval estimating the median using gamma distributed data. Generate a sample of 500 gamma distributed random numbers with shape 2 and scale 5. x = gamrnd(2,5,500,1);

Compute estimates for the parameters. params = gamfit(x) params = 1×2

32-2321

32

Functions

1.9820

5.0601

Store the estimates for the parameters as ahat and bhat. ahat = params(1); bhat = params(2);

Compute the covariance of the parameter estimates. [~,nCov] = gamlike(params,x) nCov = 2×2 0.0135 -0.0346

-0.0346 0.1141

Create a confidence interval estimating x. [x,xLo,xUp] = gaminv(0.50,ahat,bhat,nCov) x = 8.4021 xLo = 7.8669 xUp = 8.9737

Input Arguments p — Probability values at which to evaluate icdf scalar value in [0,1] | array of scalar values Probability values at which to evaluate the inverse cdf (icdf), specified as a scalar value or an array of scalar values, where each element is in the range [0,1]. If you specify pCov to compute the confidence interval [xLo,xUp], then p must be a scalar value (not an array). • To evaluate the icdf at multiple values, specify p using an array. • To evaluate the icdfs of multiple distributions, specify a and b using arrays. If one or more of the input arguments p, a, and b are arrays, then the array sizes must be the same. In this case, gaminv expands each scalar input into a constant array of the same size as the array inputs. Each element in x is the icdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in p. Example: [0.1,0.5,0.9] Data Types: single | double a — Shape Parameter positive scalar value | array of positive scalar values Shape parameter of the gamma distribution, specified as a positive scalar value or an array of positive scalar values. 32-2322

gaminv

• To evaluate the icdf at multiple values, specify p using an array. • To evaluate the icdfs of multiple distributions, specify a and b using arrays. If one or more of the input arguments p, a, and b are arrays, then the array sizes must be the same. In this case, gaminv expands each scalar input into a constant array of the same size as the array inputs. Each element in x is the icdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in p. Example: [1 2 3 5] Data Types: single | double b — Scale Parameter 1 (default) | positive scalar value | array of positive scalar values Scale parameter of the gamma distribution, specified as a positive scalar value or an array of positive scalar values. • To evaluate the icdf at multiple values, specify p using an array. • To evaluate the icdfs of multiple distributions, specify a and b using arrays. If one or more of the input arguments p, a, and b are arrays, then the array sizes must be the same. In this case, gaminv expands each scalar input into a constant array of the same size as the array inputs. Each element in x is the icdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in p. Example: [1 1 2 2] Data Types: single | double pCov — Covariance of estimates 2-by-2 numeric matrix Covariance of the estimates a and b, specified as a 2-by-2 matrix. If you specify pCov to compute the confidence interval [xLo,xUp], then p, a, and b must be scalar values. You can estimate a and b by using gamfit or mle, and estimate the covariance of a and b by using gamlike. For an example, see “Confidence Interval of Gamma icdf Value” on page 32-2321. Data Types: single | double alpha — Significance level 0.05 (default) | scalar in the range (0,1) Significance level for the confidence interval, specified as a scalar in the range (0,1). The confidence level is 100(1–alpha)%, where alpha is the probability that the confidence interval does not contain the true value. Example: 0.01 Data Types: single | double

32-2323

32

Functions

Output Arguments x — icdf values scalar value | array of scalar values icdf values evaluated at the probability values in p, returned as a scalar value or an array of scalar values. x is the same size as p, a, and b, after any necessary scalar expansion. Each element in x is the icdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in p. xLo — Lower confidence bound for x scalar value | array of scalar values Lower confidence bound for x, returned as a scalar value or an array of scalar values. xLo has the same size as x. xUp — Upper confidence bound for x scalar value | array of scalar values Upper confidence bound for x, returned as a scalar value or an array of scalar values. xUp has the same size as x.

More About Gamma icdf The gamma distribution is a two-parameter family of curves. The parameters a and b are shape and scale, respectively. The gamma inverse function in terms of the gamma cdf is x = F−1(p a, b) = x: F(x a, b) = p , where p = F(x a, b) =

x

∫t b Γ(a) 1

a

a − 1 −t e b dt .

0

The result x is the value such that an observation from the gamma distribution with parameters a and b falls in [0,x] with probability p. For more information, see “Gamma Distribution” on page B-47.

Algorithms No known analytical solution exists for the integral equation shown in “Gamma icdf” on page 322324. gaminv uses an iterative approach (Newton's method) to converge on the solution.

Alternative Functionality • gaminv is a function specific to the gamma distribution. Statistics and Machine Learning Toolbox also offers the generic function icdf, which supports various probability distributions. To use 32-2324

gaminv

icdf, create a GammaDistribution probability distribution object and pass the object as an input argument or specify the probability distribution name and its parameters. Note that the distribution-specific function gaminv is faster than the generic function icdf.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also GammaDistribution | gamcdf | gamfit | gamlike | gampdf | gamrnd | gamstat | icdf Topics “Gamma Distribution” on page B-47 Introduced before R2006a

32-2325

32

Functions

gamlike Gamma negative log-likelihood

Syntax nlogL = gamlike(params,data) [nlogL,AVAR] = gamlike(params,data)

Description nlogL = gamlike(params,data) returns the negative of the gamma log-likelihood of the parameters, params, given data. params(1)=A, shape parameters, and params(2)=B, scale parameters. The parameters in params must all be positive [nlogL,AVAR] = gamlike(params,data) also returns AVAR, which is the asymptotic variancecovariance matrix of the parameter estimates when the values in params are the maximum likelihood estimates. AVAR is the inverse of Fisher's information matrix. The diagonal elements of AVAR are the asymptotic variances of their respective parameters. [...] = gamlike(params,data,censoring) accepts a Boolean vector of the same size as data that is 1 for observations that are right-censored and 0 for observations that are observed exactly. [...] = gamfit(params,data,censoring,freq) accepts a frequency vector of the same size as data. freq typically contains integer frequencies for the corresponding elements in data, but may contain any non-negative values. gamlike is a utility function for maximum likelihood estimation of the gamma distribution. Since gamlike returns the negative gamma log-likelihood function, minimizing gamlike using fminsearch is the same as maximizing the likelihood.

Examples Compute the negative log-likelihood of parameter estimates computed by the gamfit function: a = 2; b = 3; r = gamrnd(a,b,100,1); [nlogL,AVAR] = gamlike(gamfit(r),r) nlogL = 267.5648 AVAR = 0.0788 -0.1104 -0.1104 0.1955

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. 32-2326

gamlike

This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also gamcdf | gamfit | gaminv | gampdf | gamrnd | gamstat Topics “Gamma Distribution” on page B-47 Introduced before R2006a

32-2327

32

Functions

gampdf Gamma probability density function

Syntax y = gampdf(x,a) y = gampdf(x,a,b)

Description y = gampdf(x,a) returns the probability density function (pdf) of the standard gamma distribution with the shape parameter a, evaluated at the values in x. y = gampdf(x,a,b) returns the pdf of the gamma distribution with the shape parameter a and the scale parameter b, evaluated at the values in x.

Examples Compute Gamma pdf Compute the density of the observed value 5 in the standard gamma distribution with shape parameter 2. y1 = gampdf(5,2) y1 = 0.0337

Compute the density of the observed value 5 in the gamma distributions with shape parameter 2 and scale parameters 1 through 5. y2 = gampdf(5,2,1:5) y2 = 1×5 0.0337

0.1026

0.1049

0.0895

0.0736

Input Arguments x — Values at which to evaluate pdf nonnegative scalar value | array of nonnegative scalar values Values at which to evaluate the pdf, specified as a nonnegative scalar value or an array of nonnegative scalar values. • To evaluate the pdf at multiple values, specify x using an array. • To evaluate the pdfs of multiple distributions, specify a and b using arrays. 32-2328

gampdf

If one or more of the input arguments x, a, and b are arrays, then the array sizes must be the same. In this case, gampdf expands each scalar input into a constant array of the same size as the array inputs. Each element in y is the pdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in x. Example: [3 4 7 9] Data Types: single | double a — Shape parameter positive scalar value | array of positive scalar values Shape parameter of the gamma distribution, specified as a positive scalar value or an array of positive scalar values. • To evaluate the pdf at multiple values, specify x using an array. • To evaluate the pdfs of multiple distributions, specify a and b using arrays. If one or more of the input arguments x, a, and b are arrays, then the array sizes must be the same. In this case, gampdf expands each scalar input into a constant array of the same size as the array inputs. Each element in y is the pdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in x. Example: [1 2 3 5] Data Types: single | double b — Scale parameter 1 (default) | positive scalar value | array of positive scalar values Scale parameter of the gamma distribution, specified as a positive scalar value or an array of positive scalar values. • To evaluate the pdf at multiple values, specify x using an array. • To evaluate the pdfs of multiple distributions, specify a and b using arrays. If one or more of the input arguments x, a, and b are arrays, then the array sizes must be the same. In this case, gampdf expands each scalar input into a constant array of the same size as the array inputs. Each element in y is the pdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in x. Example: [1 1 2 2] Data Types: single | double

Output Arguments y — pdf values scalar value | array of scalar values pdf values evaluated at the values in x, returned as a scalar value or an array of scalar values. y is the same size as x, a, and b after any necessary scalar expansion. Each element in y is the pdf value of the distribution specified by the corresponding elements in a and b, evaluated at the corresponding element in x. 32-2329

32

Functions

More About Gamma pdf The gamma distribution is a two-parameter family of curves. The parameters a and b are shape and scale, respectively. The gamma pdf is y = f (x a, b) =

1 a

b Γ(a)

xa − 1e

−x b ,

where Γ( · ) is the Gamma function. The standard gamma distribution occurs when b = 1. For more information, see “Gamma Distribution” on page B-47.

Alternative Functionality • gampdf is a function specific to the gamma distribution. Statistics and Machine Learning Toolbox also offers the generic function pdf, which supports various probability distributions. To use pdf, create a GammaDistribution probability distribution object and pass the object as an input argument or specify the probability distribution name and its parameters. Note that the distribution-specific function gampdf is faster than the generic function pdf. • Use the Probability Distribution Function app to create an interactive plot of the cumulative distribution function (cdf) or probability density function (pdf) for a probability distribution.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also GammaDistribution | gamcdf | gamfit | gaminv | gamlike | gamrnd | gamstat | pdf Topics “Gamma Distribution” on page B-47 Introduced before R2006a

32-2330

gamrnd

gamrnd Gamma random numbers

Syntax r = gamrnd(a,b) r = gamrnd(a,b,sz1,...,szN) r = gamrnd(a,b,sz)

Description r = gamrnd(a,b) generates a random number from the gamma distribution with the shape parameter a and the scale parameter b. r = gamrnd(a,b,sz1,...,szN) generates an array of random numbers from the gamma distribution, where sz1,...,szN indicates the size of each dimension. r = gamrnd(a,b,sz) generates an array of random numbers from the gamma distribution, where vector sz specifies size(r).

Examples Generate Gamma Random Number Generate a single random number from the gamma distribution with shape 5 and scale 7. r = gamrnd(5,7) r = 68.9857

Generate Array of Gamma Random Numbers Generate five random numbers from the gamma distributions with shape parameter values 1 through 5 and scale parameter 2. a1 = 1:5; b1 = 2; r1 = gamrnd(a1,b1) r1 = 1×5 7.1297

6.0918

2.1010

8.7253

29.5447

By default, gamrnd generates an array that is the same size as a and b after any necessary scalar expansion so that all scalars are expanded to match the dimensions of the other inputs. 32-2331

32

Functions

If you specify array dimensions sz1,...,szN or sz, they must match the dimensions of a and b after any necessary scalar expansion. Generate a 2-by-3 array of random numbers from the gamma distribution with shape parameter 3 and scale parameter 7. sz = [2 3]; r2 = gamrnd(3,7,sz) r2 = 2×3 17.9551 16.4204

41.3983 40.0048

7.9865 44.1909

Generate six random numbers from the gamma distributions with shape parameter values 1 through 6 and scale parameter values 5 through 10 respectively. a3 = 1:6; b3 = 5:10; r3 = gamrnd(a3,b3,1,6) r3 = 1×6 9.5930

7.8289

11.0360

15.0367

28.1456

98.2664

Input Arguments a — Shape parameter positive scalar value | array of positive scalar values Shape parameter of the gamma distribution, specified as a positive scalar value or an array of positive scalar values. To generate random numbers from multiple distributions, specify a and b using arrays. If both a and b are arrays, then the array sizes must be the same. If either a or b is a scalar, then gamrnd expands the scalar argument into a constant array of the same size as the other argument. Each element in r is the random number generated from the distribution specified by the corresponding elements in a and b. Example: [3 4 7 9] Data Types: single | double b — Scale parameter positive scalar value | array of positive scalar values Scale parameter of the gamma distribution, specified as a positive scalar value or an array of positive scalar values. To generate random numbers from multiple distributions, specify a and b using arrays. If both a and b are arrays, then the array sizes must be the same. If either a or b is a scalar, then gamrnd expands the scalar argument into a constant array of the same size as the other argument. Each element in r is the random number generated from the distribution specified by the corresponding elements in a and b. 32-2332

gamrnd

Example: [1 1 2 2] Data Types: single | double sz1,...,szN — Size of each dimension (as separate arguments) integers Size of each dimension, specified as separate arguments of integers. If either a or b is an array, then the specified dimensions sz1,...,szN must match the common dimensions of a and b after any necessary scalar expansion. The default values of sz1,...,szN are the common dimensions. • If you specify a single value sz1, then r is a square matrix of size sz1-by-sz1. • If the size of any dimension is 0 or negative, then r is an empty array. • Beyond the second dimension, gamrnd ignores trailing dimensions with a size of 1. For example, gamrnd(2,5,3,1,1,1) produces a 3-by-1 vector of random numbers from the gamma distribution with shape 2 and scale 5. Example: 2,4 Data Types: single | double sz — Size of each dimension (as a row vector) row vector of integers Size of each dimension, specified as a row vector of integers. If either a or b is an array, then the specified dimensions sz must match the common dimensions of a and b after any necessary scalar expansion. The default values of sz are the common dimensions. • If you specify a single value [sz1], then r is a square matrix of size sz1-by-sz1. • If the size of any dimension is 0 or negative, then r is an empty array. • Beyond the second dimension, gamrnd ignores trailing dimensions with a size of 1. For example, gamrnd(2,5,[3 1 1 1]) produces a 3-by-1 vector of random numbers from the gamma distribution with shape 2 and scale 5. Example: [2 4] Data Types: single | double

Output Arguments r — Gamma random numbers nonnegative scalar value | array of nonnegative scalar values Gamma random numbers, returned as a nonnegative scalar value or an array of nonnegative scalar values with the dimensions specified by sz1,...,szN or sz. Each element in r is the random number generated from the distribution specified by the corresponding elements in a and b.

Alternative Functionality • gamrnd is a function specific to the gamma distribution. Statistics and Machine Learning Toolbox also offers the generic function random, which supports various probability distributions. To use 32-2333

32

Functions

random, create a GammaDistribution probability distribution object and pass the object as an input argument or specify the probability distribution name and its parameters. Note that the distribution-specific function gamrnd is faster than the generic function random. • Use randg to generate random numbers from the standard gamma distribution (unit scale). • To generate random numbers interactively, use randtool, a user interface for random number generation.

References [1] Marsaglia, George, and Wai Wan Tsang. “A Simple Method for Generating Gamma Variables.” ACM Transactions on Mathematical Software 26, no. 3 (September 1, 2000): 363–72. https:// doi.org/10.1145/358407.358414.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The generated code can return a different sequence of numbers from the sequence returned by MATLAB if either of the following is true: • The output is nonscalar. • An input parameter is invalid for the distribution. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also GammaDistribution | gamcdf | gamfit | gaminv | gamlike | gampdf | gamstat | randg | random Topics “Gamma Distribution” on page B-47 Introduced before R2006a

32-2334

gamstat

gamstat Gamma mean and variance

Syntax [M,V] = gamstat(A,B)

Description [M,V] = gamstat(A,B) returns the mean of and variance for the gamma distribution with shape parameters in A and scale parameters in B. A and B can be vectors, matrices, or multidimensional arrays that have the same size, which is also the size of M and V. A scalar input for A or B is expanded to a constant array with the same dimensions as the other input. The parameters in A and B must be positive. The mean of the gamma distribution with parameters A and B is AB. The variance is AB2.

Examples [m,v] = gamstat(1:5,1:5) m = 1 4 9 16 25 v = 1 8 27 64 125 [m,v] = gamstat(1:5,1./(1:5)) m = 1 1 1 1 1 v = 1.0000 0.5000 0.3333 0.2500

0.2000

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also gamcdf | gamfit | gaminv | gamlike | gampdf | gamrnd Topics “Gamma Distribution” on page B-47 32-2335

32

Functions

Introduced before R2006a

32-2336

ge

ge Class: qrandstream Greater than or equal relation for handles

Syntax h1 >= h2

Description h1 >= h2 performs element-wise comparisons between handle arrays h1 and h2. h1 and h2 must be of the same dimensions unless one is a scalar. The result is a logical array of the same dimensions, where each element is an element-wise >= result. If one of h1 or h2 is scalar, scalar expansion is performed and the result will match the dimensions of the array that is not scalar. tf = ge(h1, h2) stores the result in a logical array of the same dimensions.

See Also eq | gt | le | lt | ne | qrandstream

32-2337

32

Functions

GeneralizedLinearMixedModel class Generalized linear mixed-effects model class

Description A GeneralizedLinearMixedModel object represents a regression model of a response variable that contains both fixed and random effects. The object comprises data, a model description, fitted coefficients, covariance parameters, design matrices, residuals, residual plots, and other diagnostic information for a generalized linear mixed-effects (GLME) model. You can predict model responses with the predict function and generate random data at new design points using the random function.

Construction You can fit a generalized linear mixed-effects (GLME) model to sample data using fitglme(tbl, formula). For more information, see fitglme. Input Arguments tbl — Input data table | dataset array Input data, which includes the response variable, predictor variables, and grouping variables, specified as a table or dataset array. The predictor variables can be continuous or grouping variables (see “Grouping Variables” on page 2-45). You must specify the model for the variables using formula. Data Types: table formula — Formula for model specification character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)' Formula for model specification, specified as a character vector or string scalar of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'. For a full description, see Formula on page 32-2350. Example: 'y ~ treatment +(1|block)'

Properties Coefficients — Estimates of fixed-effects coefficients dataset array Estimates of fixed-effects coefficients and related statistics, stored as a dataset array that has one row for each coefficient and the following columns: • Name — Name of the coefficient • Estimate — Estimated coefficient value 32-2338

GeneralizedLinearMixedModel class

• SE — Standard error of the estimate • tStat — t-statistic for a test that the coefficient is equal to 0 • DF — Degrees of freedom associated with the t statistic • pValue — p-value for the t-statistic • Lower — Lower confidence limit • Upper — Upper confidence limit To obtain any of these columns as a vector, index into the property using dot notation. Use the coefTest method to perform other tests on the coefficients. CoefficientCovariance — Covariance of estimated fixed-effects vector matrix Covariance of estimated fixed-effects vector, stored as a matrix. Data Types: single | double CoefficientNames — Names of fixed-effects coefficients cell array of character vectors Names of fixed-effects coefficients, stored as a cell array of character vectors. The label for the coefficient of the constant term is (Intercept). The labels for other coefficients indicate the terms that they multiply. When the term includes a categorical predictor, the label also indicates the level of that predictor. Data Types: cell DFE — Degrees of freedom for error positive integer value Degrees of freedom for error, stored as a positive integer value. DFE is the number of observations minus the number of estimated coefficients. DFE contains the degrees of freedom corresponding to the 'Residual' method of calculating denominator degrees of freedom for hypothesis tests on fixed-effects coefficients. If n is the number of observations and p is the number of fixed-effects coefficients, then DFE is equal to n – p. Data Types: double Dispersion — Model dispersion parameter scalar value Model dispersion parameter, stored as a scalar value. The dispersion parameter defines the conditional variance of the response. For observation i, the conditional variance of the response yi, given the conditional mean μi and the dispersion parameter σ2, in a generalized linear mixed-effects model is var yi μi, σ2 =

σ2 v μi , wi

where wi is the ith observation weight and v is the variance function for the specified conditional distribution of the response. The Dispersion property contains an estimate of σ2 for the specified 32-2339

32

Functions

GLME model. The value of Dispersion depends on the specified conditional distribution of the response. For binomial and Poisson distributions, the theoretical value of Dispersion is equal to σ2 = 1.0. • If FitMethod is MPL or REMPL and the 'DispersionFlag' name-value pair argument in fitglme is true, then a dispersion parameter is estimated from data for all distributions, including binomial and Poisson distributions. • If FitMethod is ApproximateLaplace or Laplace, then the 'DispersionFlag' name-value pair argument in fitglme does not apply, and the dispersion parameter is fixed at 1.0 for binomial and Poisson distributions. For all other distributions, Dispersion is estimated from data. Data Types: double DispersionEstimated — Flag indicating if dispersion parameter was estimated true | false Flag indicating estimated dispersion parameter, stored as a logical value. • If FitMethod is ApproximateLaplace or Laplace, then the dispersion parameter is fixed at its theoretical value of 1.0 for binomial and Poisson distributions, and DispersionEstimated is false. For other distributions, the dispersion parameter is estimated from the data, and DispersionEstimated is true. • If FitMethod is MPL or REMPL, and the 'DispersionFlag' name-value pair argument in fitglme is specified as true, then the dispersion parameter is estimated for all distributions, including binomial and Poisson distributions, and DispersionEstimated is true. • If FitMethod is MPL or REMPL, and the 'DispersionFlag' name-value pair argument in fitglme is specified as false, then the dispersion parameter is fixed at its theoretical value for binomial and Poisson distributions, and DispersionEstimated is false. For distributions other than binomial and Poisson, the dispersion parameter is estimated from the data, and DispersionEstimated is true. Data Types: logical Distribution — Response distribution name 'Normal' | 'Binomial' | 'Poisson' | 'Gamma' | 'InverseGaussian' Response distribution name, stored as one of the following: • 'Normal' — Normal distribution • 'Binomial' — Binomial distribution • 'Poisson' — Poisson distribution • 'Gamma' — Gamma distribution • 'InverseGaussian' — Inverse Gaussian distribution FitMethod — Method used to fit the model 'MPL' | 'REMPL' | 'ApproximateLaplace' | 'Laplace' Method used to fit the model, stored as one of the following. • 'MPL' — Maximum pseudo likelihood • 'REMPL' — Restricted maximum pseudo likelihood 32-2340

GeneralizedLinearMixedModel class

• 'ApproximateLaplace' — Maximum likelihood using the approximate Laplace method, with fixed effects profiled out • 'Laplace' — Maximum likelihood using the Laplace method Formula — Model specification formula object Model specification formula, stored as an object. The model specification formula uses Wilkinson’s notation to describe the relationship between the fixed-effects terms, random-effects terms, and grouping variables in the GLME model. For more information see Formula on page 32-2350. Link — Link function characteristics structure Link function characteristics, stored as a structure containing the following fields. The link is a function G that links the distribution parameter MU to the linear predictor ETA as follows: G(MU) = ETA. Field

Description

Name

Name of the link function

Link

Function that defines G

Derivative

Derivative of G

SecondDerivative

Second derivative of G

Inverse

Inverse of G

Data Types: struct LogLikelihood — Log of likelihood function scalar value Log of likelihood function evaluated at the estimated coefficient values, stored as a scalar value. LogLikelihood depends on the method used to fit the model. • If you use 'Laplace' or 'ApproximateLaplace', then LogLikelihood is the maximized log likelihood. • If you use 'MPL', then LogLikelihood is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration. • If you use 'REMPL', then LogLikelihood is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration. Data Types: double ModelCriterion — Model criterion table Model criterion to compare fitted generalized linear mixed-effects models, stored as a table with the following fields. Field

Description

AIC

Akaike information criterion

32-2341

32

Functions

Field

Description

BIC

Bayesian information criterion

LogLikelihood

• For a model fit using 'Laplace' or 'ApproximateLaplace', LogLikelihood is the maximized log likelihood. • For a model fit using 'MPL', LogLikelihood is the maximized log likelihood of the pseudo data from the final pseudo likelihood iteration. • For a model fit using 'REMPL', LogLikelihood is the maximized restricted log likelihood of the pseudo data from the final pseudo likelihood iteration.

Deviance

–2 times LogLikelihood

NumCoefficients — Number of fixed-effects coefficients positive integer value Number of fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value. Data Types: double NumEstimatedCoefficients — Number of estimated fixed-effects coefficients positive integer value Number of estimated fixed-effects coefficients in the fitted generalized linear mixed-effects model, stored as a positive integer value. Data Types: double NumObservations — Number of observations positive integer value Number of observations used in the fit, stored as a positive integer value. NumObservations is the number of rows in the table or dataset array tbl, minus rows excluded using the 'Exclude' namevalue pair of fitglme or rows containing NaN values. Data Types: double NumPredictors — Number of predictors positive integer value Number of variables used as predictors in the generalized linear mixed-effects model, stored as a positive integer value. Data Types: double NumVariables — Total number of variables positive integer value Total number of variables, including the response and predictors, stored as a positive integer value. If the sample data is in a table or dataset array tbl, then NumVariables is the total number of variables in tbl, including the response variable. NumVariables includes variables, if any, that are not used as predictors or as the response. 32-2342

GeneralizedLinearMixedModel class

Data Types: double ObservationInfo — Information about the observations table Information about the observations used in the fit, stored as a table. ObservationInfo has one row for each observation and the following columns. Name

Description

Weights

The weight value for the observation. The default value is 1.

Excluded

If the observation was excluded from the fit using the 'Exclude' name-value pair argument in fitglme, then Excluded is true, or 1. Otherwise, Excluded is false, or 0.

Missing

If the observation was excluded from the fit because any response or predictor value is missing, then Missing is true. Otherwise, Missing is false. Missing values include NaN for numeric variables, empty cells for cell arrays, blank rows for character arrays, and the value for categorical arrays.

Subset

If the observation was used in the fit, then Subset is true. If the observation was not used in the fit because it is missing or excluded, then Subset is false.

BinomSize

Binomial size for each observation. This column only applies when fitting a binomial distribution.

Data Types: table ObservationNames — Names of observations cell array of character vectors Names of observations used in the fit, stored as a cell array of character vectors. • If the data is in a table or dataset array tbl that contains observation names, then ObservationNames uses those names. • If the data is provided in matrices, or in a table or dataset array without observation names, then ObservationNames is an empty cell array. Data Types: cell PredictorNames — Names of predictors cell array of character vectors Names of the variables used as predictors in the fit, stored as a cell array of character vectors that has the same length as NumPredictors. Data Types: cell 32-2343

32

Functions

ResponseName — Name of response variable character vector Name of the variable used as the response variable in the fit, stored as a character vector. Data Types: char Rsquared — Proportion of variability in the response explained by the fitted model structure Proportion of variability in the response explained by the fitted model, stored as a structure. Rsquared contains the R-squared value of the fitted model, also known as the multiple correlation coefficient. Rsquared contains the following fields. Field

Description

Ordinary

R-squared value, stored as a scalar value in a structure. Rsquared.Ordinary = 1 — SSE./SST

Adjusted

R-squared value adjusted for the number of fixedeffects coefficients, stored as a scalar value in a structure. Rsquared.Adjusted = 1 — (SSE./ SST)*(DFT./DFE), where DFE = n – p, DFT = n – 1, n is the total number of observations, and p is the number of fixed-effects coefficients.

Data Types: struct SSE — Error sum of squares positive scalar value Error sum of squares, stored as a positive scalar value. SSE is the weighted sum of the squared conditional residuals, and is calculated as SSE =

n



i=1

ef f

wi

yi − f i

2

,

where n is the number of observations, wieff is the ith effective weight, yi is the ith response, and fi is the ith fitted value. The ith effective weight is calculated as ef f

wi

=

wi vi μi β , b

,

where vi is the variance term for the ith observation, β and b are estimated values of β and b, respectively. The ith fitted value is calculated as f i = g−1 xiT β + ziT b + δi ,

32-2344

GeneralizedLinearMixedModel class

where xiT is the ith row of the fixed-effects design matrix X, and ziT is the ith row of the random-effects design matrix Z. δi is the ith offset value. Data Types: double SSR — Regression sum of squares positive scalar value Regression sum of squares, stored as a positive scalar value. SSR is the sum of squares explained by the generalized linear mixed-effects regression, or equivalently the weighted sum of the squared deviations of the conditional fitted values from their weighted mean. SSR is calculated as N



SSR =

ef f

i=1

wi

fi − f

2

,

where n is the number of observations, wieff is the ith effective weight, fi is the ith fitted value, and f is a weighted average of the fitted values. The ith effective weight is calculated as ef f

wi

wi

=

vi μi β , b

,

where β and b are estimated values of β and b, respectively. The ith fitted value is calculated as f i = g−1 xiT β + ziT b + δi , where xiT is the ith row of the fixed-effects design matrix X, and ziT is the ith row of the random-effects design matrix Z. δi is the ith offset value. The weighted average of fitted values is calculated as n



f =

i=1 n



ef f

wi

i=1

fi .

ef f wi

Data Types: double SST — Total sum of squares positive scalar value Total sum of squares, stored as a positive scalar value. For a GLME model, SST is defined as SST = SSE + SSR. Data Types: double VariableInfo — Information about the variables table 32-2345

32

Functions

Information about the variables used in the fit, stored as a table. VariableInfo has one row for each variable and contains the following columns. Column Name

Description

Class

Class of the variable ('double', 'cell', 'nominal', and so on).

Range

Value range of the variable. • For a numerical variable, Range is a twoelement vector of the form [min,max]. • For a cell or categorical variable, Range is a cell or categorical array containing all unique values of the variable.

InModel

If the variable is a predictor in the fitted model, InModel is true. If the variable is not in the fitted model, InModel is false.

IsCategorical

If the variable type is treated as a categorical predictor (such as cell, logical, or categorical), then IsCategorical is true. If the variable is a continuous predictor, then IsCategorical is false.

Data Types: table VariableNames — Names of the variables cell array of character vectors Names of all the variables contained in the table or dataset array tbl, stored as a cell array of character vectors. Data Types: cell Variables — Variables table Variables, stored as a table. If the fit is based on a table or dataset array tbl, then Variables is identical to tbl. Data Types: table

32-2346

GeneralizedLinearMixedModel class

Methods anova

Analysis of variance for generalized linear mixed-effects model

coefCI

Confidence intervals for coefficients of generalized linear mixed-effects model

coefTest

Hypothesis test on fixed and random effects of generalized linear mixedeffects model

compare

Compare generalized linear mixed-effects models

covarianceParameters Extract covariance parameters of generalized linear mixed-effects model designMatrix

Fixed- and random-effects design matrices

fitted

Fitted responses from generalized linear mixed-effects model

fixedEffects

Estimates of fixed effects and related statistics

plotResiduals

Plot residuals of generalized linear mixed-effects model

predict

Predict response of generalized linear mixed-effects model

random

Generate random responses from fitted generalized linear mixed-effects model

randomEffects

Estimates of random effects and related statistics

refit

Refit generalized linear mixed-effects model

residuals

Residuals of fitted generalized linear mixed-effects model

response

Response vector of generalized linear mixed-effects model

Examples Fit a Generalized Linear Mixed-Effects Model Load the sample data. load mfr

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data: • Flag to indicate whether the batch used the new process (newprocess) • Processing time for each batch, in hours (time) • Temperature of the batch, in degrees Celsius (temp) • Categorical variable indicating the supplier (A, B, or C) of the chemical used in the batch (supplier) • Number of defects in the batch (defects) The data also includes time_dev and temp_dev, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius. 32-2347

32

Functions

Fit a generalized linear mixed-effects model using newprocess, time_dev, temp_dev, and supplier as fixed-effects predictors. Include a random-effects term for intercept grouped by factory, to account for quality differences that might exist due to factory-specific variations. The response variable defects has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects', so the dummy variable coefficients sum to 0. The number of defects can be modeled using a Poisson distribution defectsi j ∼ Poisson(μi j) This corresponds to the generalized linear mixed-effects model log(μi j) = β0 + β1newprocessi j + β2time_devi j + β3temp_devi j + β4supplier_Ci j + β5supplier_Bi j + bi, where • defectsi j is the number of defects observed in the batch produced by factory i during batch j. • μi j is the mean number of defects corresponding to factory i (where i = 1, 2, . . . , 20) during batch j (where j = 1, 2, . . . , 5). • newprocessi j, time_devi j, and temp_devi j are the measurements for each variable that correspond to factory i during batch j. For example, newprocessi j indicates whether the batch produced by factory i during batch j used the new process. • supplier_Ci j and supplier_Bi j are dummy variables that use effects (sum-to-zero) coding to indicate whether company C or B, respectively, supplied the process chemicals for the batch produced by factory i during batch j. • b ∼ N(0, σ2) is a random-effects intercept for each factory i that accounts for factory-specific i b variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)', ... 'Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Display the model. disp(glme) Generalized linear mixed-effects model fit by ML Model information: Number of observations Fixed effects coefficients Random effects coefficients Covariance parameters Distribution Link FitMethod

100 6 20 1 Poisson Log Laplace

Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory) Model fit statistics: AIC BIC

32-2348

LogLikelihood

Deviance

GeneralizedLinearMixedModel class

416.35

434.58

-201.17

Fixed effects coefficients Name {'(Intercept)'} {'newprocess' } {'time_dev' } {'temp_dev' } {'supplier_C' } {'supplier_B' } Lower 1.1515 -0.72019 -1.7395 -2.1926 -0.22679 -0.082588

402.35

(95% CIs): Estimate 1.4689 -0.36766 -0.094521 -0.28317 -0.071868 0.071072

SE 0.15988 0.17755 0.82849 0.9617 0.078024 0.07739

tStat 9.1875 -2.0708 -0.11409 -0.29444 -0.9211 0.91836

DF 94 94 94 94 94 94

pValue 9.8194e-15 0.041122 0.90941 0.76907 0.35936 0.36078

Upper 1.7864 -0.015134 1.5505 1.6263 0.083051 0.22473

Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 {'(Intercept)'} {'(Intercept)'} Group: Error Name {'sqrt(Dispersion)'}

Type {'std'}

Estimate 0.31381

Estimate 1

The Model information table displays the total number of observations in the sample data (100), the number of fixed- and random-effects coefficients (6 and 20, respectively), and the number of covariance parameters (1). It also indicates that the response variable has a Poisson distribution, the link function is Log, and the fit method is Laplace. Formula indicates the model specification using Wilkinson’s notation. The Model fit statistics table displays statistics used to assess the goodness of fit of the model. This includes the Akaike information criterion (AIC), Bayesian information criterion (BIC) values, log likelihood (LogLikelihood), and deviance (Deviance) values. The Fixed effects coefficients table indicates that fitglme returned 95% confidence intervals. It contains one row for each fixed-effects predictor, and each column contains statistics corresponding to that predictor. Column 1 (Name) contains the name of each fixed-effects coefficient, column 2 (Estimate) contains its estimated value, and column 3 (SE) contains the standard error of the coefficient. Column 4 (tStat) contains the t-statistic for a hypothesis test that the coefficient is equal to 0. Column 5 (DF) and column 6 (pValue) contain the degrees of freedom and p-value that correspond to the t-statistic, respectively. The last two columns (Lower and Upper) display the lower and upper limits, respectively, of the 95% confidence interval for each fixed-effects coefficient. Random effects covariance parameters displays a table for each grouping variable (here, only factory), including its total number of levels (20), and the type and estimate of the covariance parameter. Here, std indicates that fitglme returns the standard deviation of the random effect associated with the factory predictor, which has an estimated value of 0.31381. It also displays a table containing the error parameter type (here, the square root of the dispersion parameter), and its estimated value of 1. 32-2349

32

Functions

The standard display generated by fitglme does not provide confidence intervals for the randomeffects parameters. To compute and display these values, use covarianceParameters.

More About Formula In general, a formula for model specification is a character vector or string scalar of the form 'y ~ terms'. For generalized linear mixed-effects models, this formula is in the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)', where fixed and random contain the fixed-effects and the random-effects terms, respectively, and R is the number of grouping variables in the model. Suppose a table tbl contains the following: • A response variable, y • Predictor variables, Xj, which can be continuous or grouping variables • Grouping variables, g1, g2, ..., gR, where the grouping variables in Xj and gr can be categorical, logical, character arrays, string arrays, or cell arrays of character vectors. Then, in a formula of the form, 'y ~ fixed + (random1|g1) + ... + (randomR|gR)', the term fixed corresponds to a specification of the fixed-effects design matrix X, random1 is a specification of the random-effects design matrix Z1 corresponding to grouping variable g1, and similarly randomR is a specification of the random-effects design matrix ZR corresponding to grouping variable gR. You can express the fixed and random terms using Wilkinson notation. Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors. Wilkinson Notation

Factors in Standard Notation

1

Constant (intercept) term

X^k, where k is a positive integer

X, X2, ..., Xk

X1 + X2

X1, X2

X1*X2

X1, X2, X1.*X2 (elementwise multiplication of X1 and X2)

X1:X2

X1.*X2 only

- X2

Do not include X2

X1*X2 + X3

X1, X2, X3, X1*X2

X1 + X2 + X3 + X1:X2

X1, X2, X3, X1*X2

X1*X2*X3 - X1:X2:X3

X1, X2, X3, X1*X2, X1*X3, X2*X3

X1*(X2 + X3)

X1, X2, X3, X1*X2, X1*X3

Statistics and Machine Learning Toolbox notation always includes a constant term unless you explicitly remove the term using -1. Here are some examples for linear mixed-effects model specification. 32-2350

GeneralizedLinearMixedModel class

Examples: Formula

Description

'y ~ X1 + X2'

Fixed effects for the intercept, X1 and X2. This is equivalent to 'y ~ 1 + X1 + X2'.

'y ~ -1 + X1 + X2'

No intercept and fixed effects for X1 and X2. The implicit intercept term is suppressed by including -1.

'y ~ 1 + (1 | g1)'

Fixed effects for the intercept plus random effect for the intercept for each level of the grouping variable g1.

'y ~ X1 + (1 | g1)'

Random intercept model with a fixed slope.

'y ~ X1 + (X1 | g1)'

Random intercept and slope, with possible correlation between them. This is equivalent to 'y ~ 1 + X1 + (1 + X1|g1)'.

'y ~ X1 + (1 | g1) + (-1 + X1 | g1)'

Independent random effects terms for intercept and slope.

'y ~ 1 + (1 | g1) + (1 | g2) + (1 | g1:g2)'

Random intercept model with independent main effects for g1 and g2, plus an independent interaction effect.

See Also fitglme | plotPartialDependence Topics “Fit a Generalized Linear Mixed-Effects Model” on page 12-57 “Generalized Linear Mixed-Effects Models” on page 12-48

32-2351

32

Functions

GeneralizedLinearModel Generalized linear regression model class

Description GeneralizedLinearModel is a fitted generalized linear regression model. A generalized linear regression model is a special class of nonlinear models that describe a nonlinear relationship between a response and predictors. A generalized linear regression model has generalized characteristics of a linear regression model. The response variable follows a normal, binomial, Poisson, gamma, or inverse Gaussian distribution with parameters including the mean response μ. A link function f defines the relationship between μ and the linear combination of predictors. Use the properties of a GeneralizedLinearModel object to investigate a fitted generalized linear regression model. The object properties include information about coefficient estimates, summary statistics, fitting method, and input data. Use the object functions to predict responses and to modify, evaluate, and visualize the model.

Creation Create a GeneralizedLinearModel object by using fitglm or stepwiseglm. fitglm fits a generalized linear regression model to data using a fixed model specification. Use addTerms, removeTerms, or step to add or remove terms from the model. Alternatively, use stepwiseglm to fit a model using stepwise generalized linear regression.

Properties Coefficient Estimates CoefficientCovariance — Covariance matrix of coefficient estimates numeric matrix This property is read-only. Covariance matrix of coefficient estimates, specified as a p-by-p matrix of numeric values. p is the number of coefficients in the fitted model. For details, see “Coefficient Standard Errors and Confidence Intervals” on page 11-57. Data Types: single | double CoefficientNames — Coefficient names cell array of character vectors This property is read-only. Coefficient names, specified as a cell array of character vectors, each containing the name of the corresponding term. 32-2352

GeneralizedLinearModel

Data Types: cell Coefficients — Coefficient values table This property is read-only. Coefficient values, specified as a table. Coefficients contains one row for each coefficient and these columns: • Estimate — Estimated coefficient value • SE — Standard error of the estimate • tStat — t-statistic for a test that the coefficient is zero • pValue — p-value for the t-statistic Use coefTest to perform linear hypothesis tests on the coefficients. Use coefCI to find the confidence intervals of the coefficient estimates. To obtain any of these columns as a vector, index into the property using dot notation. For example, obtain the estimated coefficient vector in the model mdl: beta = mdl.Coefficients.Estimate

Data Types: table NumCoefficients — Number of model coefficients positive integer This property is read-only. Number of model coefficients, specified as a positive integer. NumCoefficients includes coefficients that are set to zero when the model terms are rank deficient. Data Types: double NumEstimatedCoefficients — Number of estimated coefficients positive integer This property is read-only. Number of estimated coefficients in the model, specified as a positive integer. NumEstimatedCoefficients does not include coefficients that are set to zero when the model terms are rank deficient. NumEstimatedCoefficients is the degrees of freedom for regression. Data Types: double Summary Statistics Deviance — Deviance of fit numeric value This property is read-only. Deviance of the fit, specified as a numeric value. The deviance is useful for comparing two models when one model is a special case of the other model. The difference between the deviance of the two models has a chi-square distribution with degrees of freedom equal to the difference in the number of 32-2353

32

Functions

estimated parameters between the two models. For more information, see “Deviance” on page 322367. Data Types: single | double DFE — Degrees of freedom for error positive integer This property is read-only. Degrees of freedom for the error (residuals), equal to the number of observations minus the number of estimated coefficients, specified as a positive integer. Data Types: double Diagnostics — Observation diagnostics table This property is read-only. Observation diagnostics, specified as a table that contains one row for each observation and the columns described in this table. Column

Meaning

Description

Leverage

Diagonal elements of HatMatrix

Leverage for each observation indicates to what extent the fit is determined by the observed predictor values. A value close to 1 indicates that the fit is largely determined by that observation, with little contribution from the other observations. A value close to 0 indicates that the fit is largely determined by the other observations. For a model with P coefficients and N observations, the average value of Leverage is P/N. A Leverage value greater than 2*P/N indicates high leverage.

CooksDistance

Cook's distance of scaled change in fitted values

CooksDistance is a measure of scaled change in fitted values. An observation with CooksDistance greater than three times the mean Cook's distance can be an outlier.

HatMatrix

Projection matrix to compute fitted from observed responses

HatMatrix is an N-by-N matrix such that Fitted = HatMatrix*Y, where Y is the response vector and Fitted is the vector of fitted response values.

The software computes these values on the scale of the linear combination of the predictors, stored in the LinearPredictor field of the Fitted and Residuals properties. For example, the software computes the diagnostic values by using the fitted response and adjusted response values from the model mdl. Yfit = mdl.Fitted.LinearPredictor Yadjusted = mdl.Fitted.LinearPredictor + mdl.Residuals.LinearPredictor

Diagnostics contains information that is helpful in finding outliers and influential observations. For more details, see “Leverage” on page 32-2366, “Cook’s Distance” on page 32-2366, and “Hat Matrix” on page 32-2366. Use plotDiagnostics to plot observation diagnostics. 32-2354

GeneralizedLinearModel

Rows not used in the fit because of missing values (in ObservationInfo.Missing) or excluded values (in ObservationInfo.Excluded) contain NaN values in the CooksDistance column and zeros in the Leverage and HatMatrix columns. To obtain any of these columns as an array, index into the property using dot notation. For example, obtain the hat matrix in the model mdl: HatMatrix = mdl.Diagnostics.HatMatrix;

Data Types: table Dispersion — Scale factor of variance of response numeric scalar This property is read-only. Scale factor of the variance of the response, specified as a numeric scalar. If the 'DispersionFlag' name-value pair argument of fitglm or stepwiseglm is true, then the function estimates the Dispersion scale factor in computing the variance of the response. The variance of the response equals the theoretical variance multiplied by the scale factor. For example, the variance function for the binomial distribution is p(1–p)/n, where p is the probability parameter and n is the sample size parameter. If Dispersion is near 1, the variance of the data appears to agree with the theoretical variance of the binomial distribution. If Dispersion is larger than 1, the data set is “overdispersed” relative to the binomial distribution. Data Types: double DispersionEstimated — Flag to indicate use of dispersion scale factor logical value This property is read-only. Flag to indicate whether fitglm used the Dispersion scale factor to compute standard errors for the coefficients in Coefficients.SE, specified as a logical value. If DispersionEstimated is false, fitglm used the theoretical value of the variance. • DispersionEstimated can be false only for the binomial and Poisson distributions. • Set DispersionEstimated by setting the 'DispersionFlag' name-value pair argument of fitglm or stepwiseglm. Data Types: logical Fitted — Fitted response values based on input data table This property is read-only. Fitted (predicted) values based on the input data, specified as a table that contains one row for each observation and the columns described in this table. Column

Description

Response

Predicted values on the scale of the response

32-2355

32

Functions

Column

Description

LinearPredictor

Predicted values on the scale of the linear combination of the predictors (same as the link function applied to the Response fitted values)

Probability

Fitted probabilities (included only with the binomial distribution)

To obtain any of these columns as a vector, index into the property using dot notation. For example, obtain the vector f of fitted values on the response scale in the model mdl: f = mdl.Fitted.Response

Use predict to compute predictions for other predictor values, or to compute confidence bounds on Fitted. Data Types: table LogLikelihood — Loglikelihood numeric value This property is read-only. Loglikelihood of the model distribution at the response values, specified as a numeric value. The mean is fitted from the model, and other parameters are estimated as part of the model fit. Data Types: single | double ModelCriterion — Criterion for model comparison structure This property is read-only. Criterion for model comparison, specified as a structure with these fields: • AIC — Akaike information criterion. AIC = –2*logL + 2*m, where logL is the loglikelihood and m is the number of estimated parameters. • AICc — Akaike information criterion corrected for the sample size. AICc = AIC + (2*m*(m + 1))/(n – m – 1), where n is the number of observations. • BIC — Bayesian information criterion. BIC = –2*logL + m*log(n). • CAIC — Consistent Akaike information criterion. CAIC = –2*logL + m*(log(n) + 1). Information criteria are model selection tools that you can use to compare multiple models fit to the same data. These criteria are likelihood-based measures of model fit that include a penalty for complexity (specifically, the number of parameters). Different information criteria are distinguished by the form of the penalty. When you compare multiple models, the model with the lowest information criterion value is the bestfitting model. The best-fitting model can vary depending on the criterion used for model comparison. To obtain any of the criterion values as a scalar, index into the property using dot notation. For example, obtain the AIC value aic in the model mdl: aic = mdl.ModelCriterion.AIC

Data Types: struct 32-2356

GeneralizedLinearModel

Residuals — Residuals for fitted model table This property is read-only. Residuals for the fitted model, specified as a table that contains one row for each observation and the columns described in this table. Column

Description

Raw

Observed minus fitted values

LinearPredictor

Residuals on the linear predictor scale, equal to the adjusted response value minus the fitted linear combination of the predictors

Pearson

Raw residuals divided by the estimated standard deviation of the response

Anscombe

Residuals defined on transformed data with the transformation selected to remove skewness

Deviance

Residuals based on the contribution of each observation to the deviance

Rows not used in the fit because of missing values (in ObservationInfo.Missing) contain NaN values. To obtain any of these columns as a vector, index into the property using dot notation. For example, obtain the ordinary raw residual vector r in the model mdl: r = mdl.Residuals.Raw

Data Types: table Rsquared — R-squared value for model structure This property is read-only. R-squared value for the model, specified as a structure with five fields: • Ordinary — Ordinary (unadjusted) R-squared • Adjusted — R-squared adjusted for the number of coefficients • LLR — Loglikelihood ratio • Deviance — Deviance • AdjGeneralized — Adjusted generalized R-squared The R-squared value is the proportion of the total sum of squares explained by the model. The ordinary R-squared value relates to the SSR and SST properties: Rsquared

=

SSR/SST

To obtain any of these values as a scalar, index into the property using dot notation. For example, obtain the adjusted R-squared value in the model mdl: r2 = mdl.Rsquared.Adjusted

Data Types: struct 32-2357

32

Functions

SSE — Sum of squared errors numeric value This property is read-only. Sum of squared errors (residuals), specified as a numeric value. Data Types: single | double SSR — Regression sum of squares numeric value This property is read-only. Regression sum of squares, specified as a numeric value. The regression sum of squares is equal to the sum of squared deviations of the fitted values from their mean. Data Types: single | double SST — Total sum of squares numeric value This property is read-only. Total sum of squares, specified as a numeric value. The total sum of squares is equal to the sum of squared deviations of the response vector y from the mean(y). Data Types: single | double Fitting Information Steps — Stepwise fitting information structure This property is read-only. Stepwise fitting information, specified as a structure with the fields described in this table. Field

Description

Start

Formula representing the starting model

Lower

Formula representing the lower bound model. The terms in Lower must remain in the model.

Upper

Formula representing the upper bound model. The model cannot contain more terms than Upper.

Criterion

Criterion used for the stepwise algorithm, such as 'sse'

PEnter

Threshold for Criterion to add a term

PRemove

Threshold for Criterion to remove a term

History

Table representing the steps taken in the fit

The History table contains one row for each step, including the initial fit, and the columns described in this table. 32-2358

GeneralizedLinearModel

Column

Description

Action

Action taken during the step: • 'Start' — First step • 'Add' — A term is added • 'Remove' — A term is removed

TermName

• If Action is 'Start', TermName specifies the starting model specification. • If Action is 'Add' or 'Remove', TermName specifies the term added or removed in the step.

Terms

Model specification in a “Terms Matrix” on page 32-2367

DF

Regression degrees of freedom after the step

delDF

Change in regression degrees of freedom from the previous step (negative for steps that remove a term)

Deviance

Deviance (residual sum of squares) at the step (only for a generalized linear regression model)

FStat

F-statistic that leads to the step

PValue

p-value of the F-statistic

The structure is empty unless you fit the model using stepwise regression. Data Types: struct Input Data Distribution — Generalized distribution information structure This property is read-only. Generalized distribution information, specified as a structure with the fields described in this table. Field

Description

Name

Name of the distribution: 'normal', 'binomial', 'poisson', 'gamma', or 'inverse gaussian'

DevianceFunction

Function that computes the components of the deviance as a function of the fitted parameter values and the response values

VarianceFunction

Function that computes the theoretical variance for the distribution as a function of the fitted parameter values. When DispersionEstimated is true, Dispersion multiplies the variance function in the computation of the coefficient standard errors.

Data Types: struct Formula — Model information LinearFormula object This property is read-only. 32-2359

32

Functions

Model information, specified as a LinearFormula object. Display the formula of the fitted model mdl using dot notation: mdl.Formula

Link — Link function structure This property is read-only. Link function, specified as a structure with the fields described in this table. Field

Description

Name

Name of the link function, specified as a character vector. If you specify the link function using a function handle, then Name is ''.

LinkFunction

Function f that defines the link function, specified as a function handle

DevianceFunction

Derivative of f, specified as a function handle

VarianceFunction

Inverse of f, specified as a function handle

The link function is a function f that links the distribution parameter μ to the fitted linear combination Xb of the predictors: f(μ) = Xb. Data Types: struct NumObservations — Number of observations positive integer This property is read-only. Number of observations the fitting function used in fitting, specified as a positive integer. NumObservations is the number of observations supplied in the original table, dataset, or matrix, minus any excluded rows (set with the 'Exclude' name-value pair argument) or rows with missing values. Data Types: double NumPredictors — Number of predictor variables positive integer This property is read-only. Number of predictor variables used to fit the model, specified as a positive integer. Data Types: double NumVariables — Number of variables positive integer This property is read-only. Number of variables in the input data, specified as a positive integer. NumVariables is the number of variables in the original table or dataset, or the total number of columns in the predictor matrix and response vector. 32-2360

GeneralizedLinearModel

NumVariables also includes any variables that are not used to fit the model as predictors or as the response. Data Types: double ObservationInfo — Observation information table This property is read-only. Observation information, specified as an n-by-4 table, where n is equal to the number of rows of input data. ObservationInfo contains the columns described in this table. Column

Description

Weights

Observation weights, specified as a numeric value. The default value is 1.

Excluded

Indicator of excluded observations, specified as a logical value. The value is true if you exclude the observation from the fit by using the 'Exclude' name-value pair argument.

Missing

Indicator of missing observations, specified as a logical value. The value is true if the observation is missing.

Subset

Indicator of whether or not the fitting function uses the observation, specified as a logical value. The value is true if the observation is not excluded or missing, meaning the fitting function uses the observation.

To obtain any of these columns as a vector, index into the property using dot notation. For example, obtain the weight vector w of the model mdl: w = mdl.ObservationInfo.Weights

Data Types: table ObservationNames — Observation names cell array of character vectors This property is read-only. Observation names, specified as a cell array of character vectors containing the names of the observations used in the fit. • If the fit is based on a table or dataset containing observation names, ObservationNames uses those names. • Otherwise, ObservationNames is an empty cell array. Data Types: cell Offset — Offset variable numeric vector This property is read-only. Offset variable, specified as a numeric vector with the same length as the number of rows in the data. Offset is passed from fitglm or stepwiseglm in the 'Offset' name-value pair argument. The fitting functions use Offset as an additional predictor variable with a coefficient value fixed at 1. In other words, the formula for fitting is 32-2361

32

Functions

f(μ) ~ Offset + (terms involving real predictors) where f is the link function. The Offset predictor has coefficient 1. For example, consider a Poisson regression model. Suppose the number of counts is known for theoretical reasons to be proportional to a predictor A. By using the log link function and by specifying log(A) as an offset, you can force the model to satisfy this theoretical constraint. Data Types: double PredictorNames — Names of predictors used to fit model cell array of character vectors This property is read-only. Names of predictors used to fit the model, specified as a cell array of character vectors. Data Types: cell ResponseName — Response variable name character vector This property is read-only. Response variable name, specified as a character vector. Data Types: char VariableInfo — Information about variables table This property is read-only. Information about variables contained in Variables, specified as a table with one row for each variable and the columns described in this table. Column

Description

Class

Variable class, specified as a cell array of character vectors, such as 'double' and 'categorical'

Range

Variable range, specified as a cell array of vectors • Continuous variable — Two-element vector [min,max], the minimum and maximum values • Categorical variable — Vector of distinct variable values

InModel

Indicator of which variables are in the fitted model, specified as a logical vector. The value is true if the model includes the variable.

IsCategorical

Indicator of categorical variables, specified as a logical vector. The value is true if the variable is categorical.

VariableInfo also includes any variables that are not used to fit the model as predictors or as the response. Data Types: table 32-2362

GeneralizedLinearModel

VariableNames — Names of variables cell array of character vectors This property is read-only. Names of variables, specified as a cell array of character vectors. • If the fit is based on a table or dataset, this property provides the names of the variables in the table or dataset. • If the fit is based on a predictor matrix and response vector, VariableNames contains the values specified by the 'VarNames' name-value pair argument of the fitting method. The default value of 'VarNames' is {'x1','x2',...,'xn','y'}. VariableNames also includes any variables that are not used to fit the model as predictors or as the response. Data Types: cell Variables — Input data table This property is read-only. Input data, specified as a table. Variables contains both predictor and response values. If the fit is based on a table or dataset array, Variables contains all the data from the table or dataset array. Otherwise, Variables is a table created from the input data matrix X and the response vector y. Variables also includes any variables that are not used to fit the model as predictors or as the response. Data Types: table

Object Functions Create CompactGeneralizedLinearModel compact

Compact generalized linear regression model

Add or Remove Terms from Generalized Linear Model addTerms removeTerms step

Add terms to generalized linear regression model Remove terms from generalized linear regression model Improve generalized linear regression model by adding or removing terms

Predict Responses feval predict random

Predict responses of generalized linear regression model using one input for each predictor Predict responses of generalized linear regression model Simulate responses with random noise for generalized linear regression model

Evaluate Generalized Linear Model coefCI coefTest devianceTest

Confidence intervals of coefficient estimates of generalized linear regression model Linear hypothesis test on generalized linear regression model coefficients Analysis of deviance for generalized linear regression model 32-2363

32

Functions

Visualize Generalized Linear Model and Summary Statistics plotDiagnostics plotPartialDependence plotResiduals plotSlice

Plot observation diagnostics of generalized linear regression model Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots Plot residuals of generalized linear regression model Plot of slices through fitted generalized linear regression surface

Examples Create Generalized Linear Regression Model Fit a logistic regression model of the probability of smoking as a function of age, weight, and sex, using a two-way interaction model. Load the hospital data set. load hospital

Convert the dataset array to a table. tbl = dataset2table(hospital);

Specify the model using a formula that includes two-way interactions and lower-order terms. modelspec = 'Smoker ~ Age*Weight*Sex - Age:Weight:Sex';

Create the generalized linear model. mdl = fitglm(tbl,modelspec,'Distribution','binomial') mdl = Generalized linear regression model: logit(Smoker) ~ 1 + Sex*Age + Sex*Weight + Age*Weight Distribution = Binomial Estimated Coefficients:

(Intercept) Sex_Male Age Weight Sex_Male:Age Sex_Male:Weight Age:Weight

Estimate ___________

SE _________

tStat ________

pValue _______

-6.0492 -2.2859 0.11691 0.031109 0.020734 0.01216 -0.00071959

19.749 12.424 0.50977 0.15208 0.20681 0.053168 0.0038964

-0.3063 -0.18399 0.22934 0.20455 0.10025 0.22871 -0.18468

0.75938 0.85402 0.81861 0.83792 0.92014 0.8191 0.85348

100 observations, 93 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 5.07, p-value = 0.535

The large p-value indicates that the model might not differ statistically from a constant.

32-2364

GeneralizedLinearModel

Create Generalized Linear Regression Model Using Stepwise Regression Create response data using three of 20 predictor variables, and create a generalized linear model using stepwise regression from a constant model to see if stepwiseglm finds the correct predictors. Generate sample data that has 20 predictor variables. Use three of the predictors to generate the Poisson response variable. rng default % for reproducibility X = randn(100,20); mu = exp(X(:,[5 10 15])*[.4;.2;.3] + 1); y = poissrnd(mu);

Fit a generalized linear regression model using the Poisson distribution. Specify the starting model as a model that contains only a constant (intercept) term. Also, specify a model with an intercept and linear term for each predictor as the largest model to consider as the fit by using the 'Upper' namevalue pair argument. mdl =

stepwiseglm(X,y,'constant','Upper','linear','Distribution','poisson')

1. Adding x5, Deviance = 134.439, Chi2Stat = 52.24814, PValue = 4.891229e-13 2. Adding x15, Deviance = 106.285, Chi2Stat = 28.15393, PValue = 1.1204e-07 3. Adding x10, Deviance = 95.0207, Chi2Stat = 11.2644, PValue = 0.000790094 mdl = Generalized linear regression model: log(y) ~ 1 + x5 + x10 + x15 Distribution = Poisson Estimated Coefficients: Estimate ________ (Intercept) x5 x10 x15

1.0115 0.39508 0.18863 0.29295

SE ________

tStat ______

pValue __________

0.064275 0.066665 0.05534 0.053269

15.737 5.9263 3.4085 5.4995

8.4217e-56 3.0977e-09 0.0006532 3.8089e-08

100 observations, 96 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 91.7, p-value = 9.61e-20

stepwiseglm finds the three correct predictors: x5, x10, and x15.

More About Canonical Link Function The default link function for a generalized linear model is the canonical link function. You can specify a link function when you fit a model with fitglm or stepwiseglm by using the 'Link' name-value pair argument.

32-2365

32

Functions

Distribution

Canonical Link Function Name

Link Function

Mean (Inverse) Function

'normal'

'identity'

f(μ) = μ

μ = Xb

'binomial'

'logit'

f(μ) = log(μ/(1 – μ))

μ = exp(Xb) / (1 + exp(Xb))

'poisson'

'log'

f(μ) = log(μ)

μ = exp(Xb)

'gamma'

-1

f(μ) = 1/μ

μ = 1/(Xb)

'inverse gaussian' -2

f(μ) = 1/μ

2

μ = (Xb)–1/2

Cook’s Distance The Cook’s distance Di of observation i is Di = wi

ei2 hii , pφ 1 − hii 2

where • φ is the dispersion parameter (estimated or theoretical). • e is the linear predictor residual, g y − x β , where i i i • g is the link function. • yi is the observed response. • xi is the observation. • β is the estimated coefficient vector. • p is the number of coefficients in the regression model. • hii is the ith diagonal element of the Hat Matrix on page 32-2366 H. Leverage Leverage is a measure of the effect of a particular observation on the regression predictions due to the position of that observation in the space of the inputs. The leverage of observation i is the value of the ith diagonal term hii of the hat matrix H. Because the sum of the leverage values is p (the number of coefficients in the regression model), an observation i can be considered an outlier if its leverage substantially exceeds p/n, where n is the number of observations. Hat Matrix The hat matrix H is defined in terms of the data matrix X and a diagonal weight matrix W: H

=

W has diagonal elements wi: wi =

32-2366

g′ μi , V μi

X(XTWX)–1XTWT.

GeneralizedLinearModel

where • g is the link function mapping yi to xib. • g′ is the derivative of the link function g. • V is the variance function. • μi is the ith mean. The diagonal elements Hii satisfy 0 ≤ hii ≤ 1 n



i=1

hii = p,

where n is the number of observations (rows of X), and p is the number of coefficients in the regression model. Deviance Deviance of a model M1 is twice the difference between the loglikelihood of the model M1 and the saturated model Ms. A saturated model is the model with the maximum number of parameters that you can estimate. For example, if you have n observations (yi, i = 1, 2, ..., n) with potentially different values for XiTβ, then you can define a saturated model with n parameters. Let L(b,y) denote the maximum value of the likelihood function for a model with the parameters b. Then the deviance of the model M1 is −2 logL b1, y − logL bS, y , where b1 and bs contain the estimated parameters for the model M1 and the saturated model, respectively. The deviance has a chi-square distribution with n – p degrees of freedom, where n is the number of parameters in the saturated model and p is the number of parameters in the model M1. Assume you have two different generalized linear regression models M1 and M2, and M1 has a subset of the terms in M2. You can assess the fit of the models by comparing the deviances D1 and D2 of the two models. The difference of the deviances is D = D2 − D1 = − 2 logL b2, y − logL bS, y + 2 logL b1, y − logL bS, y = − 2 logL b2, y − logL b1, y . Asymptotically, the difference D has a chi-square distribution with degrees of freedom v equal to the difference in the number of parameters estimated in M1 and M2. You can obtain the p-value for this test by using 1 – chi2cdf(D,v). Typically, you examine D using a model M2 with a constant term and no predictors. Therefore, D has a chi-square distribution with p – 1 degrees of freedom. If the dispersion is estimated, the difference divided by the estimated dispersion has an F distribution with p – 1 numerator degrees of freedom and n – p denominator degrees of freedom. Terms Matrix A terms matrix T is a t-by-(p + 1) matrix specifying terms in a model, where t is the number of terms, p is the number of predictor variables, and +1 accounts for the response variable. The value of T(i,j) is the exponent of variable j in term i. 32-2367

32

Functions

For example, suppose that an input includes three predictor variables A, B, and C and the response variable Y in the order A, B, C, and Y. Each row of T represents one term: • [0 0 0 0] — Constant term or intercept • [0 1 0 0] — B; equivalently, A^0 * B^1 * C^0 • [1 0 1 0] — A*C • [2 0 0 0] — A^2 • [0 1 2 0] — B*(C^2) The 0 at the end of each term represents the response variable. In general, a column vector of zeros in a terms matrix represents the position of the response variable. If you have the predictor and response variables in a matrix and column vector, then you must include 0 for the response variable in the last column of each row.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The predict and random functions support code generation. • When you fit a model by using fitglm or stepwiseglm, the following restrictions apply. • Code generation does not support categorical predictors. You cannot supply training data in a table that contains a logical vector, character array, categorical array, string array, or cell array of character vectors. Also, you cannot use the 'CategoricalVars' name-value pair argument. To include categorical predictors in a model, preprocess the categorical predictors by using dummyvar before fitting the model. • The Link, Derivative, and Inverse fields of the 'Link' name-value pair argument cannot be anonymous functions. That is, you cannot generate code using a generalized linear model that was created using anonymous functions for links. Instead, define functions for link components. For more information, see “Introduction to Code Generation” on page 31-2.

See Also CompactGeneralizedLinearModel | LinearModel | NonLinearModel | fitglm | stepwiseglm Topics “Generalized Linear Model Workflow” on page 12-28 “Generalized Linear Models” on page 12-9 Introduced in R2012a

32-2368

generateCode

generateCode Package: classreg.learning.coder.config.svm Generate C/C++ code using coder configurer

Syntax generateCode(configurer) generateCode(configurer,cfg) generateCode( ___ ,'OutputPath',outputPath)

Description After training a machine learning model, create a coder configurer for the model by using learnerCoderConfigurer. Modify the properties of the configurer to specify code generation options. Then use generateCode to generate C/C++ code for the predict and update functions of the machine learning model. Generating C/C++ code requires MATLAB Coder. This flow chart shows the code generation workflow using a coder configurer. Use generateCode for the highlighted step.

generateCode(configurer) generates a MEX (MATLAB Executable) function for the predict and update functions of a machine learning model by using configurer. The generated MEX function is named outputFileName, which is the file name stored in the OutputFileName property of configurer. To generate a MEX function, generateCode first generates the following MATLAB files required to generate code and stores them in the current folder: • predict.m, update.m, and initialize.m — predict.m and update.m are the entry-point functions for the predict and update functions of the machine learning model, respectively, and these two functions call initialize.m. • A MAT-file that includes machine learning model information — generateCode uses the saveLearnerForCoder function to save machine learning model information in a MAT-file whose file name is stored in the OutputFileName property of a coder configurer. initialize.m loads the saved MAT-file by using the loadLearnerForCoder function. After generating the necessary MATLAB files, generateCode creates the MEX function and the code for the MEX function in the codegen\mex\outputFileName folder and copies the MEX function to the current folder. 32-2369

32

Functions

generateCode(configurer,cfg) generates C/C++ code using the build type specified by cfg. generateCode( ___ ,'OutputPath',outputPath) specifies the folder path for the output files in addition to any of the input arguments in previous syntaxes. generateCode generates the MATLAB files in the folder specified by outputPath and generates C/C++ code in the folder outputPath \codegen\type\outputFileName where type is the build type specified by cfg.

Examples Generate Code Using Coder Configurer Train a machine learning model, and then generate code for the predict and update functions of the model by using a coder configurer. Load the carsmall data set and train a support vector machine (SVM) regression model. load carsmall X = [Horsepower,Weight]; Y = MPG; Mdl = fitrsvm(X,Y);

Mdl is a RegressionSVM object. Create a coder configurer for the RegressionSVM model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input. configurer = learnerCoderConfigurer(Mdl,X) configurer = RegressionSVMCoderConfigurer with properties: Update Inputs: Alpha: SupportVectors: Scale: Bias:

[1x1 [1x1 [1x1 [1x1

LearnerCoderInput] LearnerCoderInput] LearnerCoderInput] LearnerCoderInput]

Predict Inputs: X: [1x1 LearnerCoderInput] Code Generation Parameters: NumOutputs: 1 OutputFileName: 'RegressionSVMModel' Properties, Methods

configurer is a RegressionSVMCoderConfigurer object, which is a coder configurer of a RegressionSVM object. To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). 32-2370

generateCode

Generate code for the predict and update functions of the SVM regression model (Mdl) with default settings. generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'RegressionSVMModel.mat'

The generateCode function completes these actions: • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named RegressionSVMModel for the two entry-point functions. • Create the code for the MEX function in the codegen\mex\RegressionSVMModel folder. • Copy the MEX function to the current folder. Display the contents of the predict.m, update.m, and initialize.m files by using the type function. type predict.m function varargout = predict(X,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:40:03 [varargout{1:nargout}] = initialize('predict',X,varargin{:}); end type update.m function update(varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:40:03 initialize('update',varargin{:}); end type initialize.m function [varargout] = initialize(command,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:40:03 coder.inline('always') persistent model if isempty(model) model = loadLearnerForCoder('RegressionSVMModel.mat'); end switch(command) case 'update' % Update struct fields: Alpha % SupportVectors % Scale % Bias model = update(model,varargin{:}); case 'predict' % Predict Inputs: X X = varargin{1}; if nargin == 2 [varargout{1:nargout}] = predict(model,X); else PVPairs = cell(1,nargin-2); for i = 1:nargin-2

32-2371

32

Functions

PVPairs{1,i} = varargin{i+1}; end [varargout{1:nargout}] = predict(model,X,PVPairs{:}); end end end

Specify Build Type Train a machine learning model and generate code by using the coder configurer of the trained model. When generating code, specify the build type and other configuration options using a code generation configuration object. Load the ionosphere data set and train a binary support vector machine (SVM) classification model. load ionosphere Mdl = fitcsvm(X,Y);

Mdl is a ClassificationSVM object. Create a coder configurer for the ClassificationSVM model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input. configurer = learnerCoderConfigurer(Mdl,X);

configurer is a ClassificationSVMCoderConfigurer object, which is a coder configurer of a ClassificationSVM object. Create a code generation configuration object by using coder.config. Specify 'dll' to generate a dynamic library and specify the GenerateReport property as true to enable the code generation report. cfg = coder.config('dll'); cfg.GenerateReport = true;

To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Use generateCode and the configuration object cfg to generate code. Also, specify the output folder path. generateCode(configurer,cfg,'OutputPath','testPath')

Specified folder does not exist. Folder has been created. generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationSVMModel.mat' Code generation successful: To view the report, open('codegen\dll\ClassificationSVMModel\html\rep

generateCode creates the specified folder. The function also generates the MATLAB files required to generate code and stores them in the folder. Then generateCode generates C code in the testPath \codegen\dll\ClassificationSVMModel folder.

32-2372

generateCode

Update Parameters of ECOC Classification Model in Generated Code Train an error-correcting output codes (ECOC) model using SVM binary learners and create a coder configurer for the model. Use the properties of the coder configurer to specify coder attributes of the ECOC model parameters. Use the object function of the coder configurer to generate C code that predicts labels for new predictor data. Then retrain the model using different settings, and update parameters in the generated code without regenerating the code. Train Model Load Fisher's iris data set. load fisheriris X = meas; Y = species;

Create an SVM binary learner template to use a Gaussian kernel function and to standardize predictor data. t = templateSVM('KernelFunction','gaussian','Standardize',true);

Train a multiclass ECOC model using the template t. Mdl = fitcecoc(X,Y,'Learners',t);

Mdl is a ClassificationECOC object. Create Coder Configurer Create a coder configurer for the ClassificationECOC model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input. Also, set the number of outputs to 2 so that the generated code returns the first two outputs of the predict function, which are the predicted labels and negated average binary losses. configurer = learnerCoderConfigurer(Mdl,X,'NumOutputs',2) configurer = ClassificationECOCCoderConfigurer with properties: Update Inputs: BinaryLearners: [1x1 ClassificationSVMCoderConfigurer] Prior: [1x1 LearnerCoderInput] Cost: [1x1 LearnerCoderInput] Predict Inputs: X: [1x1 LearnerCoderInput] Code Generation Parameters: NumOutputs: 2 OutputFileName: 'ClassificationECOCModel' Properties, Methods

32-2373

32

Functions

configurer is a ClassificationECOCCoderConfigurer object, which is a coder configurer of a ClassificationECOC object. The display shows the tunable input arguments of predict and update: X, BinaryLearners, Prior, and Cost. Specify Coder Attributes of Parameters Specify the coder attributes of predict arguments (predictor data and the name-value pair arguments 'Decoding' and 'BinaryLoss') and update arguments (support vectors of the SVM learners) so that you can use these arguments as the input arguments of predict and update in the generated code. First, specify the coder attributes of X so that the generated code accepts any number of observations. Modify the SizeVector and VariableDimensions attributes. The SizeVector attribute specifies the upper bound of the predictor data size, and the VariableDimensions attribute specifies whether each dimension of the predictor data has a variable size or fixed size. configurer.X.SizeVector = [Inf 4]; configurer.X.VariableDimensions = [true false];

The size of the first dimension is the number of observations. In this case, the code specifies that the upper bound of the size is Inf and the size is variable, meaning that X can have any number of observations. This specification is convenient if you do not know the number of observations when generating code. The size of the second dimension is the number of predictor variables. This value must be fixed for a machine learning model. X contains 4 predictors, so the second value of the SizeVector attribute must be 4 and the second value of the VariableDimensions attribute must be false. Next, modify the coder attributes of BinaryLoss and Decoding to use the 'BinaryLoss' and 'Decoding' name-value pair arguments in the generated code. Display the coder attributes of BinaryLoss. configurer.BinaryLoss ans = EnumeratedInput with properties: Value: SelectedOption: BuiltInOptions: IsConstant: Tunability:

'hinge' 'Built-in' {1x7 cell} 1 0

To use a nondefault value in the generated code, you must specify the value before generating the code. Specify the Value attribute of BinaryLoss as 'exponential'. configurer.BinaryLoss.Value = 'exponential'; configurer.BinaryLoss ans = EnumeratedInput with properties: Value: SelectedOption: BuiltInOptions: IsConstant:

32-2374

'exponential' 'Built-in' {1x7 cell} 1

generateCode

Tunability: 1

If you modify attribute values when Tunability is false (logical 0), the software sets the Tunability to true (logical 1). Display the coder attributes of Decoding. configurer.Decoding ans = EnumeratedInput with properties: Value: SelectedOption: BuiltInOptions: IsConstant: Tunability:

'lossweighted' 'Built-in' {'lossweighted' 1 0

'lossbased'}

Specify the IsConstant attribute of Decoding as false so that you can use all available values in BuiltInOptions in the generated code. configurer.Decoding.IsConstant = false; configurer.Decoding ans = EnumeratedInput with properties: Value: SelectedOption: BuiltInOptions: IsConstant: Tunability:

[1x1 LearnerCoderInput] 'NonConstant' {'lossweighted' 'lossbased'} 0 1

The software changes the Value attribute of Decoding to a LearnerCoderInput object so that you can use both 'lossweighted' and 'lossbased' as the value of 'Decoding'. Also, the software sets the SelectedOption to 'NonConstant' and the Tunability to true. Finally, modify the coder attributes of SupportVectors in BinaryLearners. Display the coder attributes of SupportVectors. configurer.BinaryLearners.SupportVectors ans = LearnerCoderInput with properties: SizeVector: VariableDimensions: DataType: Tunability:

[54 4] [1 0] 'double' 1

The default value of VariableDimensions is [true false] because each learner has a different number of support vectors. If you retrain the ECOC model using new data or different settings, the number of support vectors in the SVM learners can vary. Therefore, increase the upper bound of the number of support vectors. 32-2375

32

Functions

configurer.BinaryLearners.SupportVectors.SizeVector = [150 4];

SizeVector attribute for Alpha has been modified to satisfy configuration constraints. SizeVector attribute for SupportVectorLabels has been modified to satisfy configuration constrain

If you modify the coder attributes of SupportVectors, then the software modifies the coder attributes of Alpha and SupportVectorLabels to satisfy configuration constraints. If the modification of the coder attributes of one parameter requires subsequent changes to other dependent parameters to satisfy configuration constraints, then the software changes the coder attributes of the dependent parameters. Display the coder configurer. configurer configurer = ClassificationECOCCoderConfigurer with properties: Update Inputs: BinaryLearners: [1x1 ClassificationSVMCoderConfigurer] Prior: [1x1 LearnerCoderInput] Cost: [1x1 LearnerCoderInput] Predict Inputs: X: [1x1 LearnerCoderInput] BinaryLoss: [1x1 EnumeratedInput] Decoding: [1x1 EnumeratedInput] Code Generation Parameters: NumOutputs: 2 OutputFileName: 'ClassificationECOCModel' Properties, Methods

The display now includes BinaryLoss and Decoding as well. Generate Code To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see “Change Default Compiler” (MATLAB). Generate code for the predict and update functions of the ECOC classification model (Mdl). generateCode(configurer) generateCode creates these files in output folder: 'initialize.m', 'predict.m', 'update.m', 'ClassificationECOCModel.mat'

The generateCode function completes these actions: • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively. • Create a MEX function named ClassificationECOCModel for the two entry-point functions. 32-2376

generateCode

• Create the code for the MEX function in the codegen\mex\ClassificationECOCModel folder. • Copy the MEX function to the current folder. Verify Generated Code Pass some predictor data to verify whether the predict function of Mdl and the predict function in the MEX function return the same labels. To call an entry-point function in a MEX function that has more than one entry point, specify the function name as the first input argument. Because you specified 'Decoding' as a tunable input argument by changing the IsConstant attribute before generating the code, you also need to specify it in the call to the MEX function, even though 'lossweighted' is the default value of 'Decoding'.

[label,NegLoss] = predict(Mdl,X,'BinaryLoss','exponential'); [label_mex,NegLoss_mex] = ClassificationECOCModel('predict',X,'BinaryLoss','exponential','Decodin

Compare label to label_mex by using isequal. isequal(label,label_mex) ans = logical 1

isequal returns logical 1 (true) if all the inputs are equal. The comparison confirms that the predict function of Mdl and the predict function in the MEX function return the same labels. NegLoss_mex might include round-off differences compared to NegLoss. In this case, compare NegLoss_mex to NegLoss, allowing a small tolerance. find(abs(NegLoss-NegLoss_mex) > 1e-8) ans = 0x1 empty double column vector

The comparison confirms that NegLoss and NegLoss_mex are equal within the tolerance 1e–8. Retrain Model and Update Parameters in Generated Code Retrain the model using a different setting. Specify 'KernelScale' as 'auto' so that the software selects an appropriate scale factor using a heuristic procedure. t_new = templateSVM('KernelFunction','gaussian','Standardize',true,'KernelScale','auto'); retrainedMdl = fitcecoc(X,Y,'Learners',t_new);

Extract parameters to update by using validatedUpdateInputs. This function detects the modified model parameters in retrainedMdl and validates whether the modified parameter values satisfy the coder attributes of the parameters. params = validatedUpdateInputs(configurer,retrainedMdl);

Update parameters in the generated code. ClassificationECOCModel('update',params)

32-2377

32

Functions

Verify Generated Code Compare the outputs from the predict function of retrainedMdl to the outputs from the predict function in the updated MEX function.

[label,NegLoss] = predict(retrainedMdl,X,'BinaryLoss','exponential','Decoding','lossbased'); [label_mex,NegLoss_mex] = ClassificationECOCModel('predict',X,'BinaryLoss','exponential','Decodin isequal(label,label_mex) ans = logical 1 find(abs(NegLoss-NegLoss_mex) > 1e-8) ans = 0x1 empty double column vector

The comparison confirms that label and label_mex are equal, and NegLoss and NegLoss_mex are equal within the tolerance.

Input Arguments configurer — Coder configurer coder configurer object Coder configurer of a machine learning model, specified as a coder configurer object created by using learnerCoderConfigurer. Model

Coder Configurer Object

Binary decision tree for multiclass classification

ClassificationTreeCoderConfigurer

SVM for one-class and binary classification

ClassificationSVMCoderConfigurer

Linear model for binary classification

ClassificationLinearCoderConfigurer

Multiclass model for SVMs and linear models

ClassificationECOCCoderConfigurer

Binary decision tree for regression

RegressionTreeCoderConfigurer

Support vector machine (SVM) regression

RegressionSVMCoderConfigurer

Linear regression

RegressionLinearCoderConfigurer

cfg — Build type 'mex' (default) | 'dll' | 'lib' | code generation configuration object Build type, specified as 'mex', 'dll', 'lib', or a code generation configuration object created by coder.config. generateCode generates C/C++ code using one of the following build types. • 'mex' — Generates a MEX function that has a platform-dependent extension. A MEX function is a C/C++ program that is executable from the Command Window. Before generating a C/C++ library for deployment, generate a MEX function to verify that the generated code provides the correct functionality. 32-2378

generateCode

• 'dll' — Generate a dynamic C/C++ library. • 'lib' — Generate a static C/C++ library. • Code generation configuration object created by coder.config — Generate C/C++ code using the code generation configuration object to customize code generation options. You can specify the build type and other configuration options using the object. For example, modify the GenerateReport parameter to enable the code generation report, and modify the TargetLang parameter to generate C++ code. The default value of the TargetLang parameter is 'C', generating C code. cfg = coder.config('mex'); cfg.GenerateReport = true; cfg.TargetLang = 'C++';

For details, see the -config option of codegen, coder.config, and “Configure Build Settings” (MATLAB Coder). generateCode generates C/C++ code in the folder outputPath\codegen\type \outputFileName, where type is the build type specified by the cfg argument and outputFileName is the file name stored in the OutputFileName property of configurer. outputPath — Folder path for output files current folder (default) | character vector | string scalar Folder path for the output files of generateCode, specified as a character vector or string array. The specified folder path can be an absolute path or a relative path to the current folder path. • The path must not contain spaces because they can lead to code generation failures in certain operating system configurations. • The path also cannot contain non-7-bit ASCII characters, such as Japanese characters. If the specified folder does not exist, then generateCode creates the folder. generateCode searches the specified folder for the four MATLAB files: predict.m, update.m, initialize.m, and a MAT-file that includes machine learning model information. If the four files do not exist in the folder, then generateCode generates the files. Otherwise, generateCode does not generate any MATLAB files. generateCode generates C/C++ code in the folder outputPath\codegen\type \outputFileName, where type is the build type specified by the cfg argument and outputFileName is the file name stored in the OutputFileName property of configurer. Example: 'C:\myfiles' Data Types: char | string

Limitations • The generateCode function uses the saveLearnerForCoder, loadLearnerForCoder, and codegen functions, so the code generation limitations of these functions also apply to the generateCode function. For details, see the function reference pages saveLearnerForCoder, loadLearnerForCoder, and codegen. • For the code generation usage notes and limitations of a machine learning model and its object functions, see the Code Generation sections of the corresponding reference pages. 32-2379

32

Functions

Model

Model Object

predict Function

Binary decision tree for multiclass classification

CompactClassificationTr predict ee

SVM for one-class and binary CompactClassificationSV predict classification M Linear model for binary classification

ClassificationLinear

predict

Multiclass model for SVMs and linear models

CompactClassificationEC predict OC

Binary decision tree for regression

CompactRegressionTree

predict

SVM regression

CompactRegressionSVM

predict

Linear regression

RegressionLinear

predict

Alternative Functionality • If you want to modify the MATLAB files (predict.m, update.m, and initialize.m) according to your code generation workflow, then use generateFiles to generate these files and use codegen to generate code.

See Also generateFiles | learnerCoderConfigurer | update | validatedUpdateInputs Topics “Introduction to Code Generation” on page 31-2 “Code Generation for Prediction and Update Using Coder Configurer” on page 31-68 Introduced in R2018b

32-2380

generateFiles

generateFiles Package: classreg.learning.coder.config.svm Generate MATLAB files for code generation using coder configurer

Syntax generateFiles(configurer) generateFiles(configurer,'OutputPath',outputPath)

Description generateFiles(configurer) generates the MATLAB files required to generate C/C++ code by using the coder configurer configurer, and saves the generated files in the current folder. To customize the code generation workflow, use generateFiles and codegen. If you do not need to customize your workflow, use generateCode. generateFiles generates the following MATLAB files: • predict.m, update.m, and initialize.m — predict.m and update.m are the entry-point functions for the predict and update functions of the machine learning model, respectively, and these two functions call initialize.m. You can modify these files according to your code generation workflow. For example, you can modify the predict.m file to include data preprocessing, or you can add these entry-point functions to another code generation project. • A MAT-file that includes machine learning model information — generateFiles uses the saveLearnerForCoder function to save machine learning model information in a MAT-file whose file name is stored in the OutputFileName property of a coder configurer. initialize.m loads the saved MAT-file by using the loadLearnerForCoder function. After you generate these files, generate C/C++ code by using codegen and the prepared codegen argument stored in the CodeGenerationArguments property of a coder configurer. If the folder already includes all four MATLAB files, then generateFiles does not generate any files. generateFiles(configurer,'OutputPath',outputPath) generates the MATLAB files in the folder specified by outputPath.

Examples Generate MATLAB® Files for Code Generation Train a machine learning model and then generate the MATLAB files required to generate C/C++ code for the predict and update functions of the model by using a coder configurer. Load the ionosphere data set and train a binary support vector machine (SVM) classification model. 32-2381

32

Functions

load ionosphere Mdl = fitcsvm(X,Y);

Mdl is a ClassificationSVM object. Create a coder configurer for the ClassificationSVM object. configurer = learnerCoderConfigurer(Mdl,X);

configurer is a ClassificationSVMCoderConfigurer object, which is a coder configurer of a ClassificationSVM object. Use generateFiles to generate the MATLAB files required to generate C/C++ code for the predict and update functions of the model. generateFiles(configurer)

generateFiles generates predict.m, update.m, initialize.m, and ClassificationSVMModel.mat (a MAT-file that includes machine learning model information). Display the contents of the predict.m, update.m, and initialize.m files. type predict.m % Display contents of predict.m function varargout = predict(X,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:53:50 [varargout{1:nargout}] = initialize('predict',X,varargin{:}); end type update.m % Display contents of update.m function update(varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:53:50 initialize('update',varargin{:}); end type initialize.m % Display contents of initialize.m function [varargout] = initialize(command,varargin) %#codegen % Autogenerated by MATLAB, 29-Feb-2020 10:53:50 coder.inline('always') persistent model if isempty(model) model = loadLearnerForCoder('ClassificationSVMModel.mat'); end switch(command) case 'update' % Update struct fields: Alpha % SupportVectors % SupportVectorLabels % Scale % Bias % Prior % Cost model = update(model,varargin{:}); case 'predict' % Predict Inputs: X X = varargin{1}; if nargin == 2

32-2382

generateFiles

[varargout{1:nargout}] = predict(model,X); else PVPairs = cell(1,nargin-2); for i = 1:nargin-2 PVPairs{1,i} = varargin{i+1}; end [varargout{1:nargout}] = predict(model,X,PVPairs{:}); end end end

Generate C/C++ code by using codegen and the prepared codegen argument stored in the CodeGenerationArguments property of configurer. cfArgs = configurer.CodeGenerationArguments; codegen(cfArgs{:})

Input Arguments configurer — Coder configurer coder configurer object Coder configurer of a machine learning model, specified as a coder configurer object created by using learnerCoderConfigurer. Model

Coder Configurer Object

Binary decision tree for multiclass classification

ClassificationTreeCoderConfigurer

SVM for one-class and binary classification

ClassificationSVMCoderConfigurer

Linear model for binary classification

ClassificationLinearCoderConfigurer

Multiclass model for SVMs and linear models

ClassificationECOCCoderConfigurer

Binary decision tree for regression

RegressionTreeCoderConfigurer

Support vector machine (SVM) regression

RegressionSVMCoderConfigurer

Linear regression

RegressionLinearCoderConfigurer

outputPath — Folder path for output files current folder (default) | character vector | string scalar Folder path for the output files of generateFiles, specified as a character vector or string array. The specified folder path can be an absolute path or a relative path to the current folder path. • The path must not contain spaces because they can lead to code generation failures in certain operating system configurations. • The path also cannot contain non-7-bit ASCII characters, such as Japanese characters. If the specified folder does not exist, then generateFiles creates the folder. generateFiles searches the specified folder for the four MATLAB files: predict.m, update.m, initialize.m, and a MAT-file that includes machine learning model information. If the four files do not exist in the folder, then generateFiles generates the files. Otherwise, generateFiles does not generate any MATLAB files. 32-2383

32

Functions

Example: 'C:\myfiles' Data Types: char | string

Alternative Functionality • To customize the code generation workflow, use generateFiles and codegen. If you do not need to customize your workflow, use generateCode. In addition to generating the four MATLAB files generated by generateFiles, the generateCode function also generates the C/C++ code.

See Also generateCode | learnerCoderConfigurer | update | validatedUpdateInputs Topics “Introduction to Code Generation” on page 31-2 “Code Generation for Prediction and Update Using Coder Configurer” on page 31-68 Introduced in R2018b

32-2384

generateLearnerDataTypeFcn

generateLearnerDataTypeFcn Generate function that defines data types for fixed-point code generation

Syntax generateLearnerDataTypeFcn(filename,X) generateLearnerDataTypeFcn(filename,X,Name,Value)

Description To generate fixed-point C/C++ code for the predict function of a machine learning model, use generateLearnerDataTypeFcn, saveLearnerForCoder, loadLearnerForCoder, and codegen. • After training a machine learning model, save the model using saveLearnerForCoder. • Create a structure that defines fixed-point data types by using the function generated from generateLearnerDataTypeFcn. • Define an entry-point function that loads the model by using both loadLearnerForCoder and the structure, and then calls the predict function. • Generate code using codegen, and then verify the generated code. The generateLearnerDataTypeFcn function requires Fixed-Point Designer, and generating fixedpoint C/C++ code requires MATLAB Coder and Fixed-Point Designer. This flow chart shows the fixed-point code generation workflow for the predict function of a machine learning model. Use generateLearnerDataTypeFcn for the highlighted step.

generateLearnerDataTypeFcn(filename,X) generates a data type function on page 32-2391 that defines fixed-point data types for the variables required to generate fixed-point C/C++ code for prediction of a machine learning model. filename stores the machine learning model, and X contains the predictor data for the predict function of the model. Use the generated function to create a structure that defines fixed-point data types. Then, use the structure as the input argument T of loadLearnerForCoder. generateLearnerDataTypeFcn(filename,X,Name,Value) specifies additional options by using one or more name-value pair arguments. For example, you can specify 'WordLength',32 to use 32bit word length for the fixed-point data types.

Examples Generate Fixed-Point C/C++ Code for Prediction After training a machine learning model, save the model using saveLearnerForCoder. For fixedpoint code generation, specify the fixed-point data types of the variables required for prediction by 32-2385

32

Functions

using the data type function generated by generateLearnerDataTypeFcn. Then, define an entrypoint function that loads the model by using both loadLearnerForCoder and the specified fixedpoint data types, and calls the predict function of the model. Use codegen to generate fixed-point C/C++ code for the entry-point function, and then verify the generated code. Before generating code using codegen, you can use buildInstrumentedMex and showInstrumentationResults to optimize the fixed-point data types to improve the performance of the fixed-point code. Record minimum and maximum values of named and internal variables for prediction by using buildInstrumentedMex. View the instrumentation results using showInstrumentationResults; then, based on the results, tune the fixed-point data type properties of the variables. For details regarding this optional step, see “Fixed-Point Code Generation for Prediction of SVM” on page 31-75. Train Model Load the ionosphere data set and train a binary SVM classification model. load ionosphere Mdl = fitcsvm(X,Y,'KernelFunction','gaussian');

Mdl is a ClassificationSVM model. Save Model Save the SVM classification model to the file myMdl.mat by using saveLearnerForCoder. saveLearnerForCoder(Mdl,'myMdl');

Define Fixed-Point Data Types Use generateLearnerDataTypeFcn to generate a function that defines the fixed-point data types of the variables required for prediction of the SVM model. generateLearnerDataTypeFcn('myMdl',X)

generateLearnerDataTypeFcn generates the myMdl_datatype function. Create a structure T that defines the fixed-point data types by using myMdl_datatype. T = myMdl_datatype('Fixed') T = struct with fields: XDataType: [0x0 embedded.fi] ScoreDataType: [0x0 embedded.fi] InnerProductDataType: [0x0 embedded.fi]

The structure T includes the fields for the named and internal variables required to run the predict function. Each field contains a fixed-point object, returned by fi. The fixed-point object specifies fixed-point data type properties, such as word length and fraction length. For example, display the fixed-point data type properties of the predictor data. T.XDataType ans = []

32-2386

generateLearnerDataTypeFcn

DataTypeMode: Signedness: WordLength: FractionLength: RoundingMethod: OverflowAction: ProductMode: MaxProductWordLength: SumMode: MaxSumWordLength:

Fixed-point: binary point scaling Signed 16 14 Floor Wrap FullPrecision 128 FullPrecision 128

Define Entry-Point Function Define an entry-point function named myFixedPointPredict that does the following: • Accept the predictor data X and the fixed-point data type structure T. • Load a fixed-point version of a trained SVM classification model by using both loadLearnerForCoder and the structure T. • Predict labels and scores using the loaded model. type myFixedPointPredict.m % Display contents of myFixedPointPredict.m file function [label,score] = myFixedPointPredict(X,T) %#codegen Mdl = loadLearnerForCoder('myMdl','DataType',T); [label,score] = predict(Mdl,X); end

Note: If you click the button located in the upper-right section of this example and open the example in MATLAB®, then MATLAB opens the example folder. This folder includes the entry-point function file. Generate Code The XDataType field of the structure T specifies the fixed-point data type of the predictor data. Convert X to the type specified in T.XDataType by using the cast function. X_fx = cast(X,'like',T.XDataType);

Generate code for the entry-point function using codegen. Specify X_fx and constant folded T as input arguments of the entry-point function. codegen myFixedPointPredict -args {X_fx,coder.Constant(T)}

codegen generates the MEX function myFixedPointPredict_mex with a platform-dependent extension. Verify Generated Code Pass predictor data to predict and myFixedPointPredict_mex to compare the outputs. [labels,scores] = predict(Mdl,X); [labels_fx,scores_fx] = myFixedPointPredict_mex(X_fx,T);

Compare the outputs from predict and myFixedPointPredict_mex. verify_labels = isequal(labels,labels_fx)

32-2387

32

Functions

verify_labels = logical 1

isequal returns logical 1 (true), which means labels and labels_fx are equal. If the labels are not equal, you can compute the percentage of incorrectly classified labels as follows. sum(strcmp(labels_fx,labels)==0)/numel(labels_fx)*100 ans = 0

Find the maximum of the relative differences between the score outputs. relDiff_scores = max(abs((scores_fx.double(:,1)-scores(:,1))./scores(:,1))) relDiff_scores = 0.0055

If you are not satisfied with the comparison results and want to improve the precision of the generated code, you can tune the fixed-point data types and regenerate the code. For details, see “Tips” on page 32-2392 in generateLearnerDataTypeFcn, “Data Type Function” on page 32-2391, and “Fixed-Point Code Generation for Prediction of SVM” on page 31-75.

Input Arguments filename — Name of MAT-file that contains structure array representing model object character vector | string scalar Name of the MATLAB formatted binary file (MAT-file) that contains the structure array representing a model object, specified as a character vector or string scalar. You must create the filename file using saveLearnerForCoder, and the model in filename can be one of the following: • Classification model • Decision tree (CompactClassificationTree) • Ensemble of decision trees (CompactClassificationEnsemble, ClassificationBaggedEnsemble) • SVM (support vector machine) (CompactClassificationSVM) • Regression model • Decision tree (CompactRegressionTree) • Ensemble of decision trees (CompactRegressionEnsemble, RegressionBaggedEnsemble) • SVM (CompactRegressionSVM) The extension of the filename file must be .mat. If filename has no extension, then generateLearnerDataTypeFcn appends .mat. If filename does not include a full path, then generateLearnerDataTypeFcn loads the file from the current folder. Example: 'myMdl' Data Types: char | string 32-2388

generateLearnerDataTypeFcn

X — Predictor data numeric matrix Predictor data for the predict function of the model stored in filename, specified as an n-by-p numeric matrix, where n is the number of observations and p is the number of predictor variables. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: generateLearnerDataTypeFcn(filename,X,'OutputFunctionName','myDataTypeFcn','W ordLength',32) generates a data type function named myDataTypeFcn that uses 32 bits for the word length when defining the fixed-point data type for each variable. OutputFunctionName — Name of generated function filename plus _datatype (default) | character vector | string scalar Name of the generated function, specified as the comma-separated pair consisting of 'OutputFunctionName' and a character vector or string scalar. The 'OutputFunctionName' value must be a valid MATLAB function name. The default function name is the file name in filename followed by _datatype. For example, if filename is myMdl, then the default function name is myMdl_datatype. Example: 'OutputFunctionName','myDataTypeFcn' Data Types: char | string WordLength — Word length in bits 16 (default) | numeric scalar Word length in bits, specified as the comma-separated pair consisting of 'WordLength' and a numeric scalar. The generated data type function on page 32-2391 defines a fixed-point object for each variable using the specified 'WordLength' value. If a variable requires a longer word length than the specified value, the software doubles the word length for the variable. The optimal word length depends on your target hardware properties. When the specified word length is longer than the longest word size of your target hardware, the generated code contains multiword operations. For details, see “Fixed-Point Data Types” (Fixed-Point Designer). Example: 'WordLength',32 Data Types: single | double OutputRange — Range of predict output range simulated using X (default) | numeric vector of two elements

32-2389

32

Functions

Range of the output argument of the predict function, specified as the comma-separated pair consisting of 'OutputRange' and a numeric vector of two elements (minimum and maximum values of the output). The 'OutputRange' value specifies the range of predicted class scores for a classification model and the range of predicted responses for a regression model. The following tables list the output arguments for which you can specify the range by using the 'OutputRange' name-value pair argument. Classification Model Model

predict Function of Model

Output Argument

Decision tree

predict

score

Ensemble of decision trees

predict

score

SVM

predict

score

Model

predict Function of Model

Output Argument

Decision tree

predict

Yfit

Ensemble of decision trees

predict

Yfit

SVM

predict

yfit

Regression Model

When X contains a large number of observations and the range for the output argument is known, specify the 'OutputRange' value to reduce the amount of computation. If you do not specify the 'OutputRange' value, then the software simulates the output range using the predictor data X and the predict function. The software determines the span of numbers that the fixed-point data can represent by using the 'OutputRange' value and the 'PercentSafetyMargin' value. Example: 'OutputRange',[0,1] Data Types: single | double PercentSafetyMargin — Safety margin percentage 10 (default) | numeric scalar Safety margin percentage, specified as the comma-separated pair consisting of 'PercentSafetyMargin' and a numeric scalar. For each variable, the software simulates the range of the variable and adds the specified safety margin to determine the span of numbers that the fixed-point data can represent. Then, the software proposes the maximum fraction length that does not cause overflows. Use caution when you specify the 'PercentSafetyMargin' value. If a variable range is large, then increasing the safety margin can cause underflow, because the software decreases fraction length to represent a larger range using a given word length. Example: 'PercentSafetyMargin',15 Data Types: single | double 32-2390

generateLearnerDataTypeFcn

More About Data Type Function Use the data type function generated by generateLearnerDataTypeFcn to create a structure that defines fixed-point data types for the variables required to generate fixed-point C/C++ code for prediction of a machine learning model. Use the output structure of the data type function as the input argument T of loadLearnerForCoder. If filename is 'myMdl', then generateLearnerDataTypeFcn generates a data type function named myMdl_datatype. The myMdl_datatype function supports this syntax: T = myMdl_datatype(dt)

T = myMdl_datatype(dt) returns a data type structure that defines data types for the variables required to generate fixed-point C/C++ code for prediction of a machine learning model. Each field of T contains a fixed-point object returned by fi. The input argument dt specifies the DataType property of the fixed-point object. • Specify dt as 'Fixed'(default) for fixed-point code generation. • Specify dt as 'Double' to simulate floating-point behavior of the fixed-point code. Use the output structure T as the second input argument of loadLearnerForCoder. The structure T contains the fields in the following table. These fields define the data types for the variables that directly influence the precision of the model. These variables, along with other named and internal variables, are required to run the predict function of the model. Description

Fields

Common fields for classification • XDataType (input) • ScoreDataType (output or internal variable) and TransformedScoreDataType (output) • If you train a model using the default 'ScoreTransform' value of 'none' or 'identity' (that is, you do not transform predicted scores), then the ScoreDataType field influences the precision of the output scores. • If you train a model using a value of 'ScoreTransform' other than 'none' or 'identity' (that is, you do transform predicted scores), then the ScoreDataType field influences the precision of the internal untransformed scores. The TransformedScoreDataType field influences the precision of the transformed output scores. Common fields for regression

• XDataType (input) • YFitDataType (output)

32-2391

32

Functions

Description

Fields

Additional fields for an ensemble of decision trees

• WeakLearnerOutputDataType (internal variable) — Data type for outputs from weak learners. • AggregatedLearnerWeightsDataType (internal variable) — Data type for a weighted aggregate of the outputs from weak learners, applicable only if you train a model using bagging ('Method','bag'). The software computes predicted scores (ScoreDataType) by dividing the aggregate by the sum of learner weights.

Additional fields for SVM

• XnormDataType (internal variable), applicable only if you train a model using 'Standardize' or 'KernelScale' • InnerProductDataType (internal variable)

The software proposes the maximum fraction length that does not cause overflows, based on the default word length (16) and safety margin (10%) for each variable. The following code shows the data type function myMdl_datatype, generated by generateLearnerDataTypeFcn when filename is 'myMdl' and the model in the filename file is an SVM classifier. function T = myMdl_datatype(dt) if nargin < 1 dt = 'Fixed'; end % Set fixed-point math settings fm = fimath('RoundingMethod','Floor', ... 'OverflowAction','Wrap', ... 'ProductMode','FullPrecision', ... 'MaxProductWordLength',128, ... 'SumMode','FullPrecision', ... 'MaxSumWordLength',128); % Data type for predictor data T.XDataType = fi([],true,16,14,fm,'DataType',dt); % Data type for output score T.ScoreDataType = fi([],true,16,14,fm,'DataType',dt);

% Internal variables % Data type of the squared distance dist = (x-sv)^2 for the Gaussian kernel G(x,sv) = exp(-dist), % where x is the predictor data for an observation and sv is a support vector T.InnerProductDataType = fi([],true,16,6,fm,'DataType',dt); end

Tips • To improve the precision of the generated fixed-point code, you can tune the fixed-point data types. Modify the fixed-point data types by updating the data type function on page 32-2391 (myMdl_datatype) and creating a new structure, and then regenerate the code using the new structure. You can update the myMdl_datatype function in one of two ways: 32-2392

generateLearnerDataTypeFcn

• Regenerate the myMdl_datatype function by using generateLearnerDataTypeFcn and its name-value pair arguments. • Increase the word length by using the 'WordLength' name-value pair argument. • Decrease the safety margin by using the 'PercentSafetyMargin' name-value pair argument. If you increase the word length or decrease the safety margin, the software can propose a longer fraction length, and therefore, improve the precision of the generated code based on the given data set. • Manually modify the fixed-point data types in the function file (myMdl_datatype.m). For each variable, you can tune the word length and fraction length and specify fixed-point math settings using a fimath object. • In the generated fixed-point code, a large number of operations or a large variable range can result in loss of precision, compared to the precision of the corresponding floating-point code. When training an SVM model, keep the following tips in mind to avoid loss of precision in the generated fixed-point code: • Data standardization ('Standardize') — To avoid overflows in the model property values of support vectors in an SVM model, you can standardize the predictor data. Instead of using the 'Standardize' name-value pair argument when training the model, standardize the predictor data before passing the data to the fitting function and the predict function so that the fixed-point code does not include the operations for the standardization. • Kernel function ('KernelFunction') — Using the Gaussian kernel or linear kernel is preferable to using a polynomial kernel. A polynomial kernel requires higher computational complexity than the other kernels, and the output of a polynomial kernel function is unbounded. • Kernel scale ('KernelScale') — Using a kernel scale requires additional operations if the value of 'KernelScale' is not 1. • The prediction of a one-class classification problem might have loss of precision if the predicted class score values have a large range.

Compatibility Considerations Specify precision of transformed scores before and after their transformation Behavior changed in R2020a In R2019b, you could train an SVM classifier for fixed-point code generation with some nondefault score transforms ('ismax', 'sign', 'symmetric', or 'symmetricismax'). The generateLearnerDataTypeFcn function generated a data type function whose output structure contained only one field related to the scores, ScoreDataType. This field influenced the precision of the output scores both before and after their transformation. Starting in R2020a, when you train an SVM classifier with a score transform other than 'none' or 'identity', the generateLearnerDataTypeFcn function generates a data type function whose output structure T contains two fields related to the scores: ScoreDataType and TransformedScoreDataType. Use these fields to influence the precision of the output scores before and after their transformation, respectively. For more details, see “Data Type Function” on page 32-2391. To update your code, rerun the generateLearnerDataTypeFcn function and then regenerate the output structure T. 32-2393

32

Functions

See Also buildInstrumentedMex | codegen | fi | loadLearnerForCoder | saveLearnerForCoder | showInstrumentationResults Topics “Fixed-Point Code Generation for Prediction of SVM” on page 31-75 Introduced in R2019b

32-2394

geocdf

geocdf Geometric cumulative distribution function

Syntax y = geocdf(x,p) y = geocdf(x,p,'upper')

Description y = geocdf(x,p) returns the cumulative distribution function (cdf) of the geometric distribution at each value in x using the corresponding probabilities in p. x and p can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other input. The parameters in p must lie on the interval [0,1]. y = geocdf(x,p,'upper') returns the complement of the geometric distribution cdf at each value in x, using an algorithm that more accurately computes the extreme upper tail probabilities.

Examples Compute Geometric Distribution cdf Suppose you toss a fair coin repeatedly, and a "success" occurs when the coin lands with heads facing up. What is the probability of observing three or fewer tails ("failures") before tossing a heads? To solve, determine the value of the cumulative distribution function (cdf) for the geometric distribution at x equal to 3. The probability of success (tossing a heads) p in any given trial is 0.5. x = 3; p = 0.5; y = geocdf(x,p) y = 0.9375

The returned value of y indicates that the probability of observing three or fewer tails before tossing a heads is 0.9375.

More About Geometric Distribution cdf The cumulative distribution function (cdf) of the geometric distribution is y = F(x p) = 1 − 1 − p

x+1

; x = 0, 1, 2, ... ,

where p is the probability of success, and x is the number of failures before the first success. The result y is the probability of observing up to x trials before a success, when the probability of success in any given trial is p. 32-2395

32

Functions

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also cdf | geoinv | geopdf | geornd | geostat | mle Topics “Geometric Distribution” on page B-63 Introduced before R2006a

32-2396

geoinv

geoinv Geometric inverse cumulative distribution function

Syntax x = geoinv(y,p)

Description x = geoinv(y,p) returns the inverse cumulative distribution function (icdf) of the geometric distribution at each value in y using the corresponding probabilities in p. geoinv returns the smallest positive integer x such that the geometric cdf evaluated at x is equal to or exceeds y. You can think of y as the probability of observing x successes in a row in independent trials, where p is the probability of success in each trial. y and p can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input for p or y is expanded to a constant array with the same dimensions as the other input. The values in p and y must lie on the interval [0,1].

Examples Compute Geometric Distribution icdf Suppose the probability of a five-year-old car battery not starting in cold weather is 0.03. If we want no more than a ten percent chance that the car does not start, what is the maximum number of days in a row that we should try to start the car? To solve, compute the inverse cdf of the geometric distribution. In this example, a "success" means the car does not start, while a "failure" means the car does start. The probability of success for each trial p equals 0.03, while the probability of observing x failures in a row before observing a success y equals 0.1. y = 0.1; p = 0.03; x = geoinv(y,p) x = 3

The returned result indicates that if we start the car three times, there is at least a ten percent chance that it will not start on one of those tries. Therefore, if we want no greater than a ten percent chance that the car will not start, we should only attempt to start it for a maximum of two days in a row. We can confirm this result by evaluating the cdf at values of x equal to 2 and 3, given the probability of success for each trial p equal to 0.03. y2 = geocdf(2,p)

% cdf for x = 2

y2 = 0.0873

32-2397

32

Functions

y3 = geocdf(3,p)

% cdf for x = 3

y3 = 0.1147

The returned results indicate an 8.7% chance of the car not starting if we try two days in a row, and an 11.5% chance of not starting if we try three days in a row.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also geocdf | geopdf | geornd | geostat | icdf Topics “Geometric Distribution” on page B-63 Introduced before R2006a

32-2398

geomean

geomean Geometric mean

Syntax m m m m m

= = = = =

geomean(X) geomean(X,'all') geomean(X,dim) geomean(X,vecdim) geomean( ___ ,nanflag)

Description m = geomean(X) returns the geometric mean on page 32-2404 of X. • If X is a vector, then geomean(X) is the geometric mean of the elements in X. • If X is a matrix, then geomean(X) is a row vector containing the geometric mean of each column of X. • If X is a multidimensional array, then geomean operates along the first nonsingleton dimension of X. m = geomean(X,'all') returns the geometric mean of all the elements in X. m = geomean(X,dim) returns the geometric mean along the operating dimension dim of X. m = geomean(X,vecdim) returns the geometric mean over the dimensions specified in the vector vecdim. For example, if X is a 2-by-3-by-4 array, then geomean(X,[1 2]) returns a 1-by-1-by-4 array. Each element of the output array is the geometric mean of the elements on the corresponding page of X. m = geomean( ___ ,nanflag) specifies whether to exclude NaN values from the calculation, using any of the input argument combinations in previous syntaxes. By default, geomean includes NaN values in the calculation (nanflag has the value 'includenan'). To exclude NaN values, set the value of nanflag to 'omitnan'.

Examples Compare Geometric and Arithmetic Mean Set the random seed for reproducibility of the results. rng('default')

Create a matrix of exponential random numbers with 5 rows and 4 columns. X = exprnd(1,5,4) X = 5×4

32-2399

32

Functions

0.2049 0.0989 2.0637 0.0906 0.4583

2.3275 1.2783 0.6035 0.0434 0.0357

1.8476 0.0298 0.0438 0.7228 0.2228

1.9527 0.8633 0.0880 0.2329 0.0414

Compute the geometric and arithmetic means of the columns of X. geometric = geomean(X) geometric = 1×4 0.2805

0.3083

0.2079

0.2698

0.5734

0.6357

arithmetic = mean(X) arithmetic = 1×4 0.5833

0.8577

The arithmetic mean is greater than the geometric mean for all the columns of X.

Geometric Mean of All Elements Find the geometric mean over multiple dimensions by using the 'all' input argument. Create a 2-by-5-by-4 array X. X = reshape(1:40,[2 5 4]) X = X(:,:,1) = 1 2

3 4

5 6

7 8

9 10

13 14

15 16

17 18

19 20

25 26

27 28

29 30

35 36

37 38

39 40

X(:,:,2) = 11 12 X(:,:,3) = 21 22

23 24

X(:,:,4) = 31 32

32-2400

33 34

geomean

Find the geometric mean of all the elements of X. m = geomean(X,'all') m = 15.7685

m is the geometric mean of the entire array X.

Geometric Mean Along Specified Dimensions Find the geometric mean along different operating dimensions and vectors of dimensions for a multidimensional array. Create a 3-by-5-by-2 array X. X = reshape(1:30,[3 5 2]) X = X(:,:,1) = 1 2 3

4 5 6

7 8 9

10 11 12

13 14 15

19 20 21

22 23 24

25 26 27

28 29 30

X(:,:,2) = 16 17 18

Find the geometric mean of X along the default dimension. gmean1 = geomean(X) gmean1 = gmean1(:,:,1) = 1.8171

4.9324

7.9581

10.9696

13.9761

22.9855

25.9872

28.9885

gmean1(:,:,2) = 16.9804

19.9833

By default, geomean operates along the first dimension of X whose size does not equal 1. In this case, this dimension is the first dimension of X. Therefore, gmean1 is a 1-by-5-by-2 array. Find the geometric mean of X along the second dimension. gmean2 = geomean(X,2) gmean2 = gmean2(:,:,1) =

32-2401

32

Functions

5.1549 6.5784 7.8155 gmean2(:,:,2) = 21.5814 22.6004 23.6177

gmean2 is a 3-by-1-by-2 array. Find the geometric mean of X along the third dimension. gmean3 = geomean(X,3) gmean3 = 3×5 4.0000 5.8310 7.3485

8.7178 10.0000 11.2250

12.4097 13.5647 14.6969

15.8114 16.9115 18.0000

19.0788 20.1494 21.2132

gmean3 is a 3-by-5 array. Find the geometric mean of each page of X by specifying the first and second dimensions using the vecdim input argument. mpage = geomean(X,[1 2]) mpage = mpage(:,:,1) = 6.4234 mpage(:,:,2) = 22.5845

For example, mpage(1,1,2) is the geometric mean of the elements in X(:,:,2). Find the geometric mean of the elements in each X(i,:,:) slice by specifying the second and third dimensions. mrow = geomean(X,[2 3]) mrow = 3×1 10.5475 12.1932 13.5862

32-2402

geomean

For example, mrow(3) is the geometric mean of the elements in X(3,:,:), and is equivalent to specifying geomean(X(3,:,:),'all').

Geometric Mean Excluding NaN Create a vector and compute its geomean, excluding NaN values. x = 1:10; x(3) = nan; % Replace the third element of x with a NaN value n = geomean(x,'omitnan') n = 4.7408

If you do not specify 'omitnan', then geomean(x) returns NaN.

Input Arguments X — Input data nonnegative vector | nonnegative matrix | nonnegative multidimensional array Input data that represents a sample from a population, specified as a nonnegative vector, matrix, or multidimensional array. • If X is a vector, then geomean(X) is the geometric mean of the elements in X. • If X is a matrix, then geomean(X) is a row vector containing the geometric mean of each column of X. • If X is a multidimensional array, then geomean operates along the first nonsingleton dimension of X. To specify the operating dimension when X is a matrix or an array, use the dim input argument. Data Types: single | double dim — Dimension positive integer scalar Dimension along which to operate, specified as a positive integer scalar. If you do not specify a value, then the default value is the first array dimension of X whose size does not equal 1. Consider a two-dimensional array X: • If dim is equal to 1, then geomean(X,1) returns a row vector containing the geometric mean for each column in X. • If dim is equal to 2, then geomean(X,2) returns a column vector containing the geometric mean for each row in X. If dim is greater than ndims(X) or if size(X,dim) is 1, then geomean returns X. Data Types: single | double vecdim — Vector of dimensions positive integer vector 32-2403

32

Functions

Vector of dimensions, specified as a positive integer vector. Each element of vecdim represents a dimension of the input array X. The output m has length 1 in the specified operating dimensions. The other dimension lengths are the same for X and m. For example, if X is a 2-by-3-by-3 array, then geomean(X,[1 2]) returns a 1-by-1-by-3 array. Each element of the output is the geometric mean of the elements on the corresponding page of X.

Data Types: single | double nanflag — NaN condition 'includenan' (default) | 'omitnan' NaN condition, specified as one of these values: • 'includenan' — Include NaN values when computing the geomean. This returns NaN. • 'omitnan' — Ignore NaN values in the input. Data Types: char | string

Output Arguments m — Geometric mean scalar | vector | matrix | multidimensional array Geometric mean, returned as a scalar, vector, matrix, or multidimensional array.

More About Geometric Mean The geometric mean of a sample X is

m=

n



i=1

xi

1 n

where n is the number of values in X.

32-2404

geomean

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB). C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • These input arguments are not supported: 'all', vecdim, and nanflag. • The dim input argument must be a compile-time constant. • If you do not specify the dim input argument, the working (or operating) dimension can be different in the generated code. As a result, run-time errors can occur. For more details, see “Automatic dimension restriction” (MATLAB Coder). For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The 'all' and vecdim input arguments are not supported. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also harmmean | mean | median | trimmean Topics “Geometric Distribution” on page B-63 Introduced before R2006a

32-2405

32

Functions

geopdf Geometric probability density function

Syntax y = geopdf(x,p)

Description y = geopdf(x,p) returns the probability density function (pdf) of the geometric distribution at each value in x using the corresponding probabilities in p. x and p can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other input. The parameters in p must lie on the interval [0,1].

Examples Compute Geometric Distribution pdf Suppose you toss a fair coin repeatedly, and a "success" occurs when the coin lands with heads facing up. What is the probability of observing exactly three tails ("failures") before tossing a heads? To solve, determine the value of the probability density function (pdf) for the geometric distribution at x equal to 3. The probability of success (tossing a heads) p in any given trial is 0.5. x = 3; p = 0.5; y = geopdf(x,p) y = 0.0625

The returned value of y indicates that the probability of observing exactly three tails before tossing a heads is 0.0625.

More About Geometric Distribution pdf The probability density function (pdf) of the geometric distribution is x

y = f (x p) = p(1 − p)

;

x = 0, 1, 2, … ,

where p is the probability of success, and x is the number of failures before the first success. The result y is the probability of observing exactly x trials before a success, when the probability of success in any given trial is p. For discrete distributions, the pdf is also known as the probability mass function (pmf). 32-2406

geopdf

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also geocdf | geoinv | geornd | geostat | mle | pdf Topics “Geometric Distribution” on page B-63 Introduced before R2006a

32-2407

32

Functions

geornd Geometric random numbers

Syntax r = geornd(p) r = geornd(p,m,n,...) r = geornd(p,[m,n,...])

Description r = geornd(p) generates random numbers from a geometric distribution with probability parameter p. p can be a vector, a matrix, or a multidimensional array. The size of r is equal to the size of p. The parameters in p must lie in the interval [0,1]. r = geornd(p,m,n,...) or r = geornd(p,[m,n,...]) generates a multidimensional m-by-nby-... array containing random numbers from the geometric distribution with probability parameter p. p can be a scalar or an array of the same size as r. The geometric distribution is useful to model the number of failures before one success in a series of independent trials, where each trial results in either success or failure, and the probability of success in any individual trial is the constant p.

Examples Generate Random Numbers from Geometric Distribution Generate a single random number from a geometric distribution with probability parameter p equal to 0.01. rng default % For reproducibility p = 0.01; r1 = geornd(0.01) r1 = 20

The returned random number represents a single experiment in which 20 failures were observed before a success, where each independent trial has a probability of success p equal to 0.01. Generate a 1-by-5 array of random numbers from a geometric distribution with probability parameter p equal to 0.01. r2 = geornd(p,1,5) r2 = 1×5 9

32-2408

205

9

45

231

geornd

Each random number in the returned array represents the result of an experiment to determine the number of failures observed before a success, where each independent trial has a probability of success p equal to 0.01. Generate a 1-by-3 array containing one random number from each of the three geometric distributions corresponding to the parameters in the 1-by-3 array of probabilities p. p = [0.01 0.1 0.5]; r3 = geornd(p,[1 3]) r3 = 1×3 127

5

0

Each element of the returned 1-by-3 array r3 contains one random number generated from the geometric distribution described by the corresponding parameter in P. For example, the first element in r3 represents an experiment in which 127 failures were observed before a success, where each independent trial has a probability of success p equal to 0.01. The second element in r3 represents an experiment in which 5 failures were observed before a success, where each independent trial has a probability of success p equal to 0.1. The third element in r3 represents an experiment in which zero failures were observed before a success - in other words, the first attempt was a success - where each independent trial has a probability of success p equal to 0.5.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The generated code can return a different sequence of numbers than MATLAB if either of the following is true: • The output is nonscalar. • An input parameter is invalid for the distribution. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also geocdf | geoinv | geopdf | geostat | random Topics “Geometric Distribution” on page B-63 32-2409

32

Functions

Introduced before R2006a

32-2410

geostat

geostat Geometric mean and variance

Syntax [m,v] = geostat(p)

Description [m,v] = geostat(p) returns the mean m and variance v of a geometric distribution with corresponding probability parameters in p. p can be a vector, a matrix, or a multidimensional array. The parameters in p must lie in the interval [0,1].

Examples Compute Mean and Variance of Geometric Distribution Define a probability vector that contains six different parameter values. p = 1./(1:6) p = 1×6 1.0000

0.5000

0.3333

0.2500

0.2000

0.1667

Compute the mean and variance of the geometric distribution that corresponds to each value contained in probability vector. [m,v] = geostat(1./(1:6)) m = 1×6 0

1.0000

2.0000

3.0000

4.0000

5.0000

0

2.0000

6.0000

12.0000

20.0000

30.0000

v = 1×6

The returned values indicate that, for example, the mean of a geometric distribution with probability parameter p equal to 1/3 is 2, and its variance is 6.

32-2411

32

Functions

More About Geometric Distribution Mean and Variance The mean of the geometric distribution is mean = distribution is var =

1−p , and the variance of the geometric p

1−p , where p is the probability of success. p2

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also geocdf | geoinv | geopdf | geornd Topics “Geometric Distribution” on page B-63 Introduced before R2006a

32-2412

GapEvaluation

GapEvaluation Package: clustering.evaluation Superclasses: clustering.evaluation.ClusterCriterion Gap criterion clustering evaluation object

Description GapEvaluation is an object consisting of sample data, clustering data, and gap criterion values used to evaluate the optimal number of clusters. Create a gap criterion clustering evaluation object using evalclusters.

Construction eva = evalclusters(x,clust,'Gap') creates a gap criterion clustering evaluation object. eva = evalclusters(x,clust,'Gap',Name,Value) creates a gap criterion clustering evaluation object using additional options specified by one or more name-value pair arguments. Input Arguments x — Input data matrix Input data, specified as an N-by-P matrix. N is the number of observations, and P is the number of variables. Data Types: single | double clust — Clustering algorithm 'kmeans' | 'linkage' | 'gmdistribution' | matrix of clustering solutions | function handle Clustering algorithm, specified as one of the following. 'kmeans'

Cluster the data in x using the kmeans clustering algorithm, with 'EmptyAction' set to 'singleton' and 'Replicates' set to 5.

'linkage'

Cluster the data in x using the clusterdata agglomerative clustering algorithm, with 'Linkage' set to 'ward'.

'gmdistribution'

Cluster the data in x using the gmdistribution Gaussian mixture distribution algorithm, with 'SharedCov' set to true and 'Replicates' set to 5.

If criterion is 'CalinskiHarabasz', 'DaviesBouldin', or 'silhouette', you can specify a clustering algorithm using a function handle (MATLAB). The function must be of the form C = clustfun(DATA,K), where DATA is the data to be clustered, and K is the number of clusters. The output of clustfun must be one of the following: • A vector of integers representing the cluster index for each observation in DATA. There must be K unique values in this vector. 32-2413

32

Functions

• A numeric n-by-K matrix of score for n observations and K classes. In this case, the cluster index for each observation is determined by taking the largest score value in each row. If criterion is 'CalinskiHarabasz', 'DaviesBouldin', or 'silhouette', you can also specify clust as a n-by-K matrix containing the proposed clustering solutions. n is the number of observations in the sample data, and K is the number of proposed clustering solutions. Column j contains the cluster indices for each of the N points in the jth clustering solution. Data Types: single | double | char | string | function_handle Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'KList',[1:5],'Distance','cityblock' specifies to test 1, 2, 3, 4, and 5 clusters using the city block distance metric. B — Number of reference data sets 100 (default) | positive integer value Number of reference data sets generated from the reference distribution ReferenceDistribution, specified as the comma-separated pair consisting of 'B' and a positive integer value. Example: 'B',150 Data Types: single | double Distance — Distance metric 'sqEuclidean' (default) | 'Euclidean' | 'cityblock' | function | ... Distance metric used for computing the criterion values, specified as the comma-separated pair consisting of 'Distance' and one of the following. 'sqEuclidean'

Squared Euclidean distance

'Euclidean'

Euclidean distance

'cityblock'

Sum of absolute differences

'cosine'

One minus the cosine of the included angle between points (treated as vectors)

'correlation'

One minus the sample correlation between points (treated as sequences of values)

For detailed information about each distance metric, see pdist. You can also specify a function for the distance metric by using a function handle (MATLAB). The distance function must be of the form d2 = distfun(XI,XJ),

where XI is a 1-by-n vector corresponding to a single row of the input matrix X, and XJ is an m2-by-n matrix corresponding to multiple rows of X. distfun must return an m2-by-1 vector of distances d2, whose kth element is the distance between XI and XJ(k,:). Distance only accepts a function handle if the clustering algorithm clust accepts a function handle as the distance metric. For example, the kmeans clustering algorithm does not accept a function 32-2414

GapEvaluation

handle as the distance metric. Therefore, if you use the kmeans algorithm and then specify a function handle for Distance, the software errors. • When clust is 'kmeans' or 'gmdistribution', evalclusters uses the distance metric specified for Distance to cluster the data. • If clust is 'linkage', and Distance is either 'sqEuclidean' or 'Euclidean', then the clustering algorithm uses Euclidean distance and Ward linkage. • If clust is 'linkage' and Distance is any other metric, then the clustering algorithm uses the specified distance metric and average linkage. • In all other cases, the distance metric specified for Distance must match the distance metric used in the clustering algorithm to obtain meaningful results. Example: 'Distance','Euclidean' Data Types: single | double | char | string | function_handle KList — List of number of clusters to evaluate vector List of number of clusters to evaluate, specified as the comma-separated pair consisting of 'KList' and a vector of positive integer values. You must specify KList when clust is a clustering algorithm name or a function handle. When criterion is 'gap', clust must be a character vector, a string scalar, or a function handle, and you must specify KList. Example: 'KList',[1:6] Data Types: single | double ReferenceDistribution — Reference data generation method 'PCA' (default) | 'uniform' Reference data generation method, specified as the comma-separated pair consisting of 'ReferenceDistributions' and one of the following. 'PCA'

Generate reference data from a uniform distribution over a box aligned with the principal components of the data matrix x.

'uniform'

Generate reference data uniformly over the range of each feature in the data matrix x.

Example: 'ReferenceDistribution','uniform' SearchMethod — Method for selecting optimal number of clusters 'globalMaxSE' (default) | 'firstMaxSE' Method for selecting the optimal number of clusters, specified as the comma-separated pair consisting of 'SearchMethod' and one of the following.

32-2415

32

Functions

'globalMaxSE'

Evaluate each proposed number of clusters in KList and select the smallest number of clusters satisfying Gap K ≥ GAPMAX − SE(GAPMAX), where K is the number of clusters, Gap(K) is the gap value for the clustering solution with K clusters, GAPMAX is the largest gap value, and SE(GAPMAX) is the standard error corresponding to the largest gap value.

'firstMaxSE'

Evaluate each proposed number of clusters in KList and select the smallest number of clusters satisfying Gap(K) ≥ Gap(K + 1) − SE(K + 1), where K is the number of clusters, Gap(K) is the gap value for the clustering solution with K clusters, and SE(K + 1) is the standard error of the clustering solution with K + 1 clusters.

Example: 'SearchMethod','globalMaxSE'

Properties B Number of data sets generated from the reference distribution, stored as a positive integer value. ClusteringFunction Clustering algorithm used to cluster the input data, stored as a valid clustering algorithm name or function handle. If the clustering solutions are provided in the input, ClusteringFunction is empty. CriterionName Name of the criterion used for clustering evaluation, stored as a valid criterion name. CriterionValues Criterion values corresponding to each proposed number of clusters in InspectedK, stored as a vector of numerical values. Distance Distance metric used for clustering data, stored as a valid distance metric name. ExpectedLogW Expectation of the natural logarithm of W based on the generated reference data, stored as a vector of scalar values. W is the within-cluster dispersion computed using the distance metric Distance. InspectedK List of the number of proposed clusters for which to compute criterion values, stored as a vector of positive integer values. 32-2416

GapEvaluation

LogW Natural logarithm of W based on the input data, stored as a vector of scalar values. W is the withincluster dispersion computed using the distance metric Distance. Missing Logical flag for excluded data, stored as a column vector of logical values. If Missing equals true, then the corresponding value in the data matrix x is not used in the clustering solution. NumObservations Number of observations in the data matrix X, minus the number of missing (NaN) values in X, stored as a positive integer value. OptimalK Optimal number of clusters, stored as a positive integer value. OptimalY Optimal clustering solution corresponding to OptimalK, stored as a column vector of positive integer values. If the clustering solutions are provided in the input, OptimalY is empty. ReferenceDistribution Reference data generation method, stored as a valid reference distribution name. SE Standard error of the natural logarithm of W with respect to the reference data for each number of clusters in InspectedK, stored as a vector of scalar values. W is the within-cluster dispersion computed using the distance metric Distance. SearchMethod Method for determining the optimal number of clusters, stored as a valid search method name. StdLogW Standard deviation of the natural logarithm of W with respect to the reference data for each number of clusters in InspectedK. W is the within-cluster dispersion computed using the distance metric Distance. X Data used for clustering, stored as a matrix of numerical values.

Methods increaseB

Increase reference data sets

32-2417

32

Functions

Inherited Methods addK

Evaluate additional numbers of clusters

plot

Plot clustering evaluation object criterion values

compact

Compact clustering evaluation object

Examples Evaluate Clustering Solution Using Gap Criterion Evaluate the optimal number of clusters using the gap clustering evaluation criterion. Load the sample data. load fisheriris

The data contains sepal and petal measurements from three species of iris flowers. Evaluate the number of clusters based on the gap criterion values. Cluster the data using kmeans. rng('default'); % For reproducibility eva = evalclusters(meas,'kmeans','gap','KList',[1:6]) eva = GapEvaluation with properties: NumObservations: InspectedK: CriterionValues: OptimalK:

150 [1 2 3 4 5 6] [0.0720 0.5928 0.8762 1.0114 1.0534 1.0720] 5

The OptimalK value indicates that, based on the gap criterion, the optimal number of clusters is five. Plot the gap criterion values for each number of clusters tested. plot(eva)

32-2418

GapEvaluation

Based on the plot, the maximum value of the gap criterion occurs at six clusters. However, the value at five clusters is within one standard error of the maximum, so the suggested optimal number of clusters is five. Create a grouped scatter plot to examine the relationship between petal length and width. Group the data by suggested clusters. figure PetalLength = meas(:,3); PetalWidth = meas(:,4); ClusterGroup = eva.OptimalY; gscatter(PetalLength,PetalWidth,ClusterGroup,'rbgkc','xod^*');

32-2419

32

Functions

The plot shows cluster 4 in the lower-left corner, completely separated from the other four clusters. Cluster 4 contains flowers with the smallest petal widths and lengths. Cluster 2 is in the upper-right corner and contains flowers with the largest petal widths and lengths. Cluster 5 is next to cluster 2 and contains flowers with similar petal widths as the flowers in cluster 2, but smaller petal lengths than the flowers in cluster 2. Clusters 1 and 3 are near the center of the plot and contain flowers with measurements between the extremes.

More About Gap Value A common graphical approach to cluster evaluation involves plotting an error measurement versus several proposed numbers of clusters, and locating the “elbow” of this plot. The “elbow” occurs at the most dramatic decrease in error measurement. The gap criterion formalizes this approach by estimating the “elbow” location as the number of clusters with the largest gap value. Therefore, under the gap criterion, the optimal number of clusters occurs at the solution with the largest local or global gap value within a tolerance range. The gap value is defined as Gapn k = En* log Wk

− log Wk ,

where n is the sample size, k is the number of clusters being evaluated, and Wk is the pooled withincluster dispersion measurement 32-2420

GapEvaluation

Wk =

k

1 Dr , 2n r r=1



where nr is the number of data points in cluster r, and Dr is the sum of the pairwise distances for all points in cluster r. The expected value En* log Wk is determined by Monte Carlo sampling from a reference distribution, and log(Wk) is computed from the sample data. The gap value is defined even for clustering solutions that contain only one cluster, and can be used with any distance metric. However, the gap criterion is more computationally expensive than other cluster evaluation criteria, because the clustering algorithm must be applied to the reference data for each proposed clustering solution.

References [1] Tibshirani, R., G. Walther, and T. Hastie. “Estimating the number of clusters in a data set via the gap statistic.” Journal of the Royal Statistical Society: Series B. Vol. 63, Part 2, 2001, pp. 411– 423.

See Also CalinskiHarabaszEvaluation | DaviesBouldinEvaluation | SilhouetteEvaluation | evalclusters Topics Class Attributes (MATLAB) Property Attributes (MATLAB)

32-2421

32

Functions

get Class: dataset (Not Recommended) Access dataset array properties Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax get(A) s = get(A) p = get(A,PropertyName) p = get(A,{PropertyName1,PropertyName2,...})

Description get(A) displays a list of property/value pairs for the dataset array A. s = get(A) returns the values in a scalar structure s with field names given by the properties. p = get(A,PropertyName) returns the value of the property specified by PropertyName. p = get(A,{PropertyName1,PropertyName2,...}) allows multiple property names to be specified and returns their values in a cell array.

Examples Create a dataset array from Fisher's iris data and access the information. load fisheriris NumObs = size(meas,1); NameObs = strcat({'Obs'},num2str((1:NumObs)','%-d')); iris = dataset({nominal(species),'species'},... {meas,'SL','SW','PL','PW'},... 'ObsNames',NameObs); get(iris) Description: '' Units: {} DimNames: {'Observations' 'Variables'} UserData: [] ObsNames: {150x1 cell} VarNames: {'species' 'SL' 'SW' 'PL' 'PW'} ON = get(iris,'ObsNames'); ON(1:3) ans = 'Obs1' 'Obs2' 'Obs3'

32-2422

get

See Also set | summary

32-2423

32

Functions

getlabels (Not Recommended) Access nominal or ordinal array labels Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Syntax labels = getlabels(A)

Description labels = getlabels(A) returns the labels of the levels in the nominal or ordinal array A as a cell array of character vectors, labels. If A is an ordinal array, getlabels returns the labels in the order of the levels.

Input Arguments A — Nominal or ordinal array nominal array | ordinal array Nominal or ordinal array, specified as a nominal or ordinal array object created with nominal or ordinal.

See Also getlevels | nominal | ordinal Topics “Change Category Labels” on page 2-7 Introduced in R2007a

32-2424

getlevels

getlevels (Not Recommended) Access nominal or ordinal array levels Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Syntax L = getlevels(A)

Description L = getlevels(A) returns the levels in the nominal or ordinal array A. L is a vector of the same type as A.

Input Arguments A — Nominal or ordinal array nominal array | ordinal array Nominal or ordinal array, specified as a nominal or ordinal array object created with nominal or ordinal.

See Also getlabels | nominal | ordinal Topics “Add and Drop Category Levels” on page 2-18 “Merge Category Levels” on page 2-16 “Reorder Category Levels” on page 2-9 Introduced in R2009a

32-2425

32

Functions

gevcdf Generalized extreme value cumulative distribution function

Syntax p = gevcdf(x,k,sigma,mu) p = gevcdf(x,k,sigma,mu,'upper')

Description p = gevcdf(x,k,sigma,mu) returns the cdf of the generalized extreme value (GEV) distribution with shape parameter k, scale parameter sigma, and location parameter, mu, evaluated at the values in x. The size of p is the common size of the input arguments. A scalar input functions as a constant matrix of the same size as the other inputs. p = gevcdf(x,k,sigma,mu,'upper') returns the complement of the cdf of the GEV distribution, using an algorithm that more accurately computes the extreme upper tail probabilities. Default values for k, sigma, and mu are 0, 1, and 0, respectively. When k < 0, the GEV is the type III extreme value distribution. When k > 0, the GEV distribution is the type II, or Frechet, extreme value distribution. If w has a Weibull distribution as computed by the wblcdf function, then -w has a type III extreme value distribution and 1/w has a type II extreme value distribution. In the limit as k approaches 0, the GEV is the mirror image of the type I extreme value distribution as computed by the evcdf function. The mean of the GEV distribution is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. The GEV distribution has positive density only for values of X such that k*(X-mu)/sigma > -1.

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also cdf | gevfit | gevinv | gevlike | gevpdf | gevrnd | gevstat Topics “Generalized Extreme Value Distribution” on page B-55 32-2426

gevcdf

Introduced before R2006a

32-2427

32

Functions

gevfit Generalized extreme value parameter estimates

Syntax parmhat = gevfit(X) [parmhat,parmci] = gevfit(X) [parmhat,parmci] = gevfit(X,alpha) [...] = gevfit(X,alpha,options)

Description parmhat = gevfit(X) returns maximum likelihood estimates of the parameters for the generalized extreme value (GEV) distribution given the data in X. parmhat(1) is the shape parameter, k, parmhat(2) is the scale parameter, sigma, and parmhat(3) is the location parameter, mu. [parmhat,parmci] = gevfit(X) returns 95% confidence intervals for the parameter estimates. [parmhat,parmci] = gevfit(X,alpha) returns 100(1-alpha)% confidence intervals for the parameter estimates. [...] = gevfit(X,alpha,options) specifies control parameters for the iterative algorithm used to compute ML estimates. This argument can be created by a call to statset. See statset('gevfit') for parameter names and default values. Pass in [] for alpha to use the default values. When k < 0, the GEV is the type III extreme value distribution. When k > 0, the GEV distribution is the type II, or Frechet, extreme value distribution. If w has a Weibull distribution as computed by the wblfit function, then -w has a type III extreme value distribution and 1/w has a type II extreme value distribution. In the limit as k approaches 0, the GEV is the mirror image of the type I extreme value distribution as computed by the evfit function. The mean of the GEV distribution is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. The GEV distribution is defined for k*(X-mu)/sigma > -1.

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

See Also gevcdf | gevinv | gevlike | gevpdf | gevrnd | gevstat | mle Topics “Generalized Extreme Value Distribution” on page B-55 32-2428

gevfit

Introduced before R2006a

32-2429

32

Functions

gevinv Generalized extreme value inverse cumulative distribution function

Syntax X = gevinv(P,k,sigma,mu)

Description X = gevinv(P,k,sigma,mu) returns the inverse cdf of the generalized extreme value (GEV) distribution with shape parameter k, scale parameter sigma, and location parameter mu, evaluated at the values in P. The size of X is the common size of the input arguments. A scalar input functions as a constant matrix of the same size as the other inputs. Default values for k, sigma, and mu are 0, 1, and 0, respectively. When k < 0, the GEV is the type III extreme value distribution. When k > 0, the GEV distribution is the type II, or Frechet, extreme value distribution. If w has a Weibull distribution as computed by the wblinv function, then -w has a type III extreme value distribution and 1/w has a type II extreme value distribution. In the limit as k approaches 0, the GEV is the mirror image of the type I extreme value distribution as computed by the evinv function. The mean of the GEV distribution is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. The GEV distribution has positive density only for values of X such that k*(X-mu)/sigma > -1.

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also gevcdf | gevfit | gevlike | gevpdf | gevrnd | gevstat | icdf Topics “Generalized Extreme Value Distribution” on page B-55 Introduced before R2006a 32-2430

gevlike

gevlike Generalized extreme value negative log-likelihood

Syntax nlogL = gevlike(params,data) [nlogL,ACOV] = gevlike(params,data)

Description nlogL = gevlike(params,data) returns the negative of the log-likelihood nlogL for the generalized extreme value (GEV) distribution, evaluated at parameters params. params(1) is the shape parameter, k, params(2) is the scale parameter, sigma, and params(3) is the location parameter, mu. [nlogL,ACOV] = gevlike(params,data) returns the inverse of Fisher's information matrix, ACOV. If the input parameter values in params are the maximum likelihood estimates, the diagonal elements of ACOV are their asymptotic variances. ACOV is based on the observed Fisher's information, not the expected information. When k < 0, the GEV is the type III extreme value distribution. When k > 0, the GEV distribution is the type II, or Frechet, extreme value distribution. If w has a Weibull distribution as computed by the wbllike function, then -w has a type III extreme value distribution and 1/w has a type II extreme value distribution. In the limit as k approaches 0, the GEV is the mirror image of the type I extreme value distribution as computed by the evlike function. The mean of the GEV distribution is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. The GEV distribution has positive density only for values of X such that k*(X-mu)/sigma > -1.

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah.Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

See Also gevcdf | gevfit | gevinv | gevpdf | gevrnd | gevstat Topics “Generalized Extreme Value Distribution” on page B-55 Introduced before R2006a 32-2431

32

Functions

gevpdf Generalized extreme value probability density function

Syntax Y = gevpdf(X,k,sigma,mu)

Description Y = gevpdf(X,k,sigma,mu) returns the pdf of the generalized extreme value (GEV) distribution with shape parameter k, scale parameter sigma, and location parameter, mu, evaluated at the values in X. The size of Y is the common size of the input arguments. A scalar input functions as a constant matrix of the same size as the other inputs. Default values for k, sigma, and mu are 0, 1, and 0, respectively. When k < 0, the GEV is the type III extreme value distribution. When k > 0, the GEV distribution is the type II, or Frechet, extreme value distribution. If w has a Weibull distribution as computed by the wblpdf function, then -w has a type III extreme value distribution and 1/w has a type II extreme value distribution. In the limit as k approaches 0, the GEV is the mirror image of the type I extreme value distribution as computed by the evcdf function. The mean of the GEV distribution is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. The GEV distribution has positive density only for values of X such that k*(X-mu)/sigma > -1.

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also gevcdf | gevfit | gevinv | gevlike | gevrnd | gevstat | pdf Topics “Generalized Extreme Value Distribution” on page B-55 Introduced before R2006a 32-2432

gevrnd

gevrnd Generalized extreme value random numbers

Syntax R = gevrnd(k,sigma,mu) R = gevrnd(k,sigma,mu,m,n,...) R = gevrnd(k,sigma,mu,[m,n,...])

Description R = gevrnd(k,sigma,mu) returns an array of random numbers chosen from the generalized extreme value (GEV) distribution with shape parameter k, scale parameter sigma, and location parameter, mu. The size of R is the common size of the input arguments if all are arrays. If any parameter is a scalar, the size of R is the size of the other parameters. R = gevrnd(k,sigma,mu,m,n,...) or R = gevrnd(k,sigma,mu,[m,n,...]) generates an mby-n-by-... array containing random numbers from the GEV distribution with parameters k, sigma, and mu. The k, sigma, mu parameters can each be scalars or arrays of the same size as R. When k < 0, the GEV is the type III extreme value distribution. When k > 0, the GEV distribution is the type II, or Frechet, extreme value distribution. If w has a Weibull distribution as computed by the wblrnd function, then -w has a type III extreme value distribution and 1/w has a type II extreme value distribution. In the limit as k approaches 0, the GEV is the mirror image of the type I extreme value distribution as computed by the evrnd function. The mean of the GEV distribution is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. The GEV distribution has positive density only for values of X such that k*(X-mu)/sigma > -1.

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The generated code can return a different sequence of numbers than MATLAB if either of the following is true: 32-2433

32

Functions

• The output is nonscalar. • An input parameter is invalid for the distribution. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also gevcdf | gevfit | gevinv | gevlike | gevpdf | gevstat | random Topics “Generalized Extreme Value Distribution” on page B-55 Introduced before R2006a

32-2434

gevstat

gevstat Generalized extreme value mean and variance

Syntax [M,V] = gevstat(k,sigma,mu)

Description [M,V] = gevstat(k,sigma,mu) returns the mean of and variance for the generalized extreme value (GEV) distribution with shape parameter k, scale parameter sigma, and location parameter, mu. The sizes of M and V are the common size of the input arguments. A scalar input functions as a constant matrix of the same size as the other inputs. Default values for k, sigma, and mu are 0, 1, and 0, respectively. When k < 0, the GEV is the type III extreme value distribution. When k > 0, the GEV distribution is the type II, or Frechet, extreme value distribution. If w has a Weibull distribution as computed by the wblstat function, then -w has a type III extreme value distribution and 1/w has a type II extreme value distribution. In the limit as k approaches 0, the GEV is the mirror image of the type I extreme value distribution as computed by the evstat function. The mean of the GEV distribution is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. The GEV distribution has positive density only for values of X such that k*(X-mu)/sigma > -1.

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also gevcdf | gevfit | gevinv | gevlike | gevpdf | gevrnd Topics “Generalized Extreme Value Distribution” on page B-55 Introduced before R2006a 32-2435

32

Functions

gline Interactively add line to plot

Syntax gline(h) gline hline = gline(...)

Description gline(h) allows you to draw a line segment in the figure with handle h by clicking the pointer at the two endpoints. A rubber-band line tracks the pointer movement. gline with no input arguments defaults to h = gcf and draws in the current figure. hline = gline(...) returns the handle hline to the line.

Examples Use gline to connect two points in a plot: x = 1:10; y = x + randn(1,10); scatter(x,y,25,'b','*') lsline mu = mean(y); hold on plot([1 10],[mu mu],'ro') hline = gline; % Connect circles set(hline,'Color','r')

32-2436

gline

See Also lsline | refcurve | refline Introduced before R2006a

32-2437

32

Functions

glmfit Generalized linear model regression

Syntax b = glmfit(X,y,distr) b = glmfit(X,y,distr,param1,val1,param2,val2,...) [b,dev] = glmfit(...) [b,dev,stats] = glmfit(...)

Description b = glmfit(X,y,distr) returns a (p + 1)-by-1 vector b of coefficient estimates for a generalized linear regression of the responses in y on the predictors in X, using the distribution distr. X is an nby-p matrix of p predictors at each of n observations. distr can be any of the following: 'binomial', 'gamma', 'inverse gaussian', 'normal' (the default), and 'poisson'. In most cases, y is an n-by-1 vector of observed responses. For the binomial distribution, y can be a binary vector indicating success or failure at each observation, or a two column matrix with the first column indicating the number of successes for each observation and the second column indicating the number of trials for each observation. This syntax uses the canonical link (see below) to relate the distribution to the predictors. Note By default, glmfit adds a first column of 1s to X, corresponding to a constant term in the model. Do not enter a column of 1s directly into X. You can change the default behavior of glmfit using the 'constant' parameter, below. glmfit treats NaNs in either X or y as missing values, and ignores them. b = glmfit(X,y,distr,param1,val1,param2,val2,...) additionally allows you to specify optional parameter name/value pairs to control the model fit. Acceptable parameters are as follows. Parameter

Value

Description

'link'

'identity', default for the distribution 'normal'

µ = Xb

'log', default for the distribution 'poisson'

log(µ) = Xb

'logit', default for the distribution 'binomial'

log(µ/(1 – µ)) = Xb

'probit'

norminv(µ) = Xb

'comploglog'

log( -log(1 – µ)) = Xb

'reciprocal', default for the 1/µ = Xb distribution 'gamma' 'loglog'

32-2438

log( -log(µ)) = Xb

glmfit

Parameter

Value

Description

p (a number), default for the distribution 'inverse gaussian' (with p = -2)

µp = Xb

cell array of the form {FL FD FI}, containing three function handles, created using @, that define the link (FL), the derivative of the link (FD), and the inverse link (FI).

Custom-defined link function. You must provide

structure array having these fields:

Custom-defined link function, its derivative, and its inverse.

• FL(mu) • FD = dFL(mu)/dmu • FI = FL^(-1)

• 'Link' — Link function • 'Derivative' — Derivative of the link function • 'Inverse' — Inverse of the link function The value of each field is a character vector corresponding to a function that is on the path or a function handle (created using @). 'estdisp'

'on'

glmfit estimates a dispersion parameter for the binomial or Poisson distribution.

'off' (Default for binomial or glmfit uses the theoretical value of 1.0 for Poisson distribution) those distributions. 'offset'

Vector

glmfit uses offset as an additional predictor variable, but with a coefficient value fixed at 1.0.

'weights'

Vector of prior weights, such as the inverses of the relative variance of each observation

'constant'

'on' (default)

glmfit includes a constant term in the model and returns a (p + 1)-by-1 vector of coefficient estimates b. The coefficient of the constant term is the first element of b.

'off'

glmfit omits the constant term and returns a p-by-1 vector of coefficient estimates b.

[b,dev] = glmfit(...) returns dev, the deviance of the fit at the solution vector. The deviance is a generalization of the residual sum of squares. It is possible to perform an analysis of deviance to compare several models, each a subset of the other, and to test whether the model with more terms is significantly better than the model with fewer terms.

32-2439

32

Functions

[b,dev,stats] = glmfit(...) returns dev and stats. stats is a structure with the following fields: • beta — Coefficient estimates b • dfe — Degrees of freedom for error • sfit — Estimated dispersion parameter • s — Theoretical or estimated dispersion parameter • estdisp — 0 when the 'estdisp' name-value pair argument value is 'off' and 1 when the 'estdisp' name-value pair argument value is 'on'. • covb — Estimated covariance matrix for B • se — Vector of standard errors of the coefficient estimates b • coeffcorr — Correlation matrix for b • t — t statistics for b • p — p-values for b • resid — Vector of residuals • residp — Vector of Pearson residuals • residd — Vector of deviance residuals • resida — Vector of Anscombe residuals If you estimate a dispersion parameter for the binomial or Poisson distribution, then stats.s is set equal to stats.sfit. Also, the elements of stats.se differ by the factor stats.s from their theoretical values.

Examples Fit Generalized Linear Model with Probit Link Enter the sample data. x = [2100 2300 2500 3300 3500 3700 n = [48 42 31 34 31 y = [1 2 0 3 8 8 14

2700 2900 3100 ... 3900 4100 4300]'; 21 23 23 21 16 17 21]'; 17 19 15 17 21]';

Each y value is the number of successes in the corresponding number of trials in n, and x contains the predictor variable values. Fit a probit regression model for y on x. b = glmfit(x,[y n],'binomial','link','probit');

Compute the estimated number of successes. Plot the percent observed and estimated percent success versus the x values. yfit = glmval(b,x,'probit','size',n); plot(x, y./n,'o',x,yfit./n,'-','LineWidth',2)

32-2440

glmfit

Use Custom-Defined Link Function Load the sample data. load fisheriris

The column vector, species, consists of iris flowers of three different species, setosa, versicolor, virginica. The double matrix meas consists of four types of measurements on the flowers, the length and width of sepals and petals in centimeters, respectively. Define the response and predictor variables. X = meas(51:end,:); y = strcmp('versicolor',species(51:end));

Define three function handles, created using @, that define the link, the derivative of the link, and the inverse link for a logit link function. Store them in a cell array. link = @(mu) log(mu ./ (1-mu)); derlink = @(mu) 1 ./ (mu .* (1-mu)); invlink = @(resp) 1 ./ (1 + exp(-resp)); F = {link, derlink, invlink};

Fit a logistic regression using glmfit with the link function that you defined. 32-2441

32

Functions

b = glmfit(X,y,'binomial','link',F) b = 5×1 42.6378 2.4652 6.6809 -9.4294 -18.2861

Fit a generalized linear model by using the logit link function and compare the results. b = glmfit(X,y,'binomial','link','logit') b = 5×1 42.6378 2.4652 6.6809 -9.4294 -18.2861

References [1] Dobson, A. J. An Introduction to Generalized Linear Models. New York: Chapman & Hall, 1990. [2] McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990. [3] Collett, D. Modeling Binary Data. New York: Chapman & Hall, 2002.

See Also GeneralizedLinearModel | fitglm | glmval | regress | regstats | stepwiseglm Introduced before R2006a

32-2442

glmval

glmval Generalized linear model values

Syntax yhat = glmval(b,X,link) [yhat,dylo,dyhi] = glmval(b,X,link,stats) [...] = glmval(...,param1,val1,param2,val2,...)

Description yhat = glmval(b,X,link) computes predicted values for the generalized linear model with link function link and predictors X. Distinct predictor variables should appear in different columns of X. b is a vector of coefficient estimates as returned by the glmfit function. link can be any of the character vectors, string scalars, or custom-defined link functions used as values for the 'link' name-value pair argument in the glmfit function. Note By default, glmval adds a first column of 1s to X, corresponding to a constant term in the model. Do not enter a column of 1s directly into X. You can change the default behavior of glmval using the 'constant' parameter. [yhat,dylo,dyhi] = glmval(b,X,link,stats) also computes 95% confidence bounds for the predicted values. When the stats structure output of the glmfit function is specified, dylo and dyhi are also returned. dylo and dyhi define a lower confidence bound of yhat-dylo, and an upper confidence bound of yhat+dyhi. Confidence bounds are nonsimultaneous, and apply to the fitted curve, not to a new observation. [...] = glmval(...,param1,val1,param2,val2,...) specifies optional parameter name/ value pairs to control the predicted values. Acceptable parameters are listed in this table: Parameter

Value

'confidence' — the confidence level for the confidence bounds

A scalar between 0 and 1

'size' — the size parameter (N) for a binomial model

A scalar, or a vector with one value for each row of X

'offset' — used as an additional predictor variable, but with a coefficient value fixed at 1.0

A vector

'constant'

• 'on' — Includes a constant term in the model. The coefficient of the constant term is the first element of b. • 'off' — Omit the constant term

'simultaneous' — Compute simultaneous confidence intervals (true), or compute nonsimultaneous confidence intervals (default false)

true or false

32-2443

32

Functions

Examples Fit Generalized Linear Model with Probit Link Enter the sample data. x = [2100 2300 2500 3300 3500 3700 n = [48 42 31 34 31 y = [1 2 0 3 8 8 14

2700 2900 3100 ... 3900 4100 4300]'; 21 23 23 21 16 17 21]'; 17 19 15 17 21]';

Each y value is the number of successes in the corresponding number of trials in n, and x contains the predictor variable values. Fit a probit regression model for y on x. b = glmfit(x,[y n],'binomial','link','probit');

Compute the estimated number of successes. Plot the percent observed and estimated percent success versus the x values. yfit = glmval(b,x,'probit','size',n); plot(x, y./n,'o',x,yfit./n,'-','LineWidth',2)

32-2444

glmval

Use Custom-Defined Link Function Enter sample data. x = [2100 2300 2500 3300 3500 3700 n = [48 42 31 34 31 y = [1 2 0 3 8 8 14

2700 2900 3100 ... 3900 4100 4300]'; 21 23 23 21 16 17 21]'; 17 19 15 17 21]';

Each y value is the number of successes in corresponding number of trials in n, and x contains the predictor variable values. Define three function handles, created by using @, that define the link, the derivative of the link, and the inverse link for a probit link function,. Store the handles in a cell array. link = @(mu) norminv(mu); derlink = @(mu) 1 ./ normpdf(norminv(mu)); invlink = @(resp) normcdf(resp); F = {link, derlink, invlink};

Fit a generalized linear model for y on x by using the link function that you defined. b = glmfit(x,[y n],'binomial','link',F);

Compute the estimated number of successes. Plot the observed and estimated percent success versus the x values. yfit = glmval(b,x,F,'size',n); plot(x, y./n,'o',x,yfit./n,'-','LineWidth',2)

32-2445

32

Functions

Generate Code from Function That Predicts Responses Given New Data Train a generalized linear model, and then generate code from a function that classifies new observations based on the model. This example is based on the “Use Custom-Defined Link Function” on page 32-2444 example. Enter the sample data. x = [2100 2300 2500 3300 3500 3700 n = [48 42 31 34 31 y = [1 2 0 3 8 8 14

2700 2900 3100 ... 3900 4100 4300]'; 21 23 23 21 16 17 21]'; 17 19 15 17 21]';

Suppose that the inverse normal pdf is an appropriate link function for the problem. Define a function named myInvNorm.m that accepts values of of the inverse of the standard normal cdf.

and returns corresponding values

function in = myInvNorm(mu) %#codegen %myInvNorm Inverse of standard normal cdf for code generation % myInvNorm is a GLM link function that accepts a numeric vector mu, and % returns in, which is a numeric vector of corresponding values of the % inverse of the standard normal cdf. % in = norminv(mu); end

Define another function named myDInvNorm.m that accepts values of values of the derivative of the link function.

and returns corresponding

function din = myDInvNorm(mu) %#codegen %myDInvNorm Derivative of inverse of standard normal cdf for code %generation % myDInvNorm corresponds to the derivative of the GLM link function % myInvNorm. myDInvNorm accepts a numeric vector mu, and returns din, % which is a numeric vector of corresponding derivatives of the inverse % of the standard normal cdf. % din = 1./normpdf(norminv(mu)); end

Define another function named myInvInvNorm.m that accepts values of corresponding values of the inverse of the link function.

and returns

function iin = myInvInvNorm(mu) %#codegen %myInvInvNorm Standard normal cdf for code generation % myInvInvNorm is the inverse of the GLM link function myInvNorm.

32-2446

glmval

% myInvInvNorm accepts a numeric vector mu, and returns iin, which is a % numeric vector of corresponding values of the standard normal cdf. % iin = normcdf(mu); end

Create a structure array that specifies each of the link functions. Specifically, the structure array contains fields named 'Link', 'Derivative', and 'Inverse'. The corresponding values are the names of the functions. link = struct('Link','myInvNorm','Derivative','myDInvNorm',... 'Inverse','myInvInvNorm') link = struct with fields: Link: 'myInvNorm' Derivative: 'myDInvNorm' Inverse: 'myInvInvNorm'

Fit a GLM for y on x using the link function link. Return the structure array of statistics. [b,~,stats] = glmfit(x,[y n],'binomial','link',link);

b is a 2-by-1 vector of regression coefficients. In your current working folder, define a function called classifyGLM.m that: • Accepts measurements with columns corresponding to those in x, regression coefficients whose dimensions correspond to b, a link function, the structure of GLM statistics, and any valid glmval name-value pair argument • Returns predictions and confidence interval margins of error function [yhat,lo,hi] = classifyGLM(b,x,link,varargin) %#codegen %CLASSIFYGLM Classify measurements using GLM model % CLASSIFYGLM classifies the n observations in the n-by-1 vector x using % the GLM model with regression coefficients b and link function link, % and then returns the n-by-1 vector of predicted values in yhat. % CLASSIFYGLM also returns margins of error for the predictions using % additional information in the GLM statistics structure stats. narginchk(3,Inf); if(isstruct(varargin{1})) stats = varargin{1}; [yhat,lo,hi] = glmval(b,x,link,stats,varargin{2:end}); else yhat = glmval(b,x,link,varargin{:}); end end

32-2447

32

Functions

Generate a MEX function from classifyGLM.m. Because C uses static typing, codegen must determine the properties of all variables in MATLAB® files at compile time. To ensure that the MEX function can use the same inputs, use the -args argument to specify the following in the order given: • Regression coefficients b as a compile-time constant • In-sample observations x • Link function as a compile-time constant • Resulting GLM statistics as a compile-time constant • Name 'Confidence' as a compile-time constant • Confidence level 0.9 To designate arguments as compile-time constants, use coder.Constant.

codegen -config:mex classifyGLM -args {coder.Constant(b),x,coder.Constant(link),coder.Constant(st

codegen generates the MEX file classifyGLM_mex.mexw64 in your current folder. The file extension depends on your system platform. Compare predictions by using glmval and classifyGLM_mex. Specify name-value pair arguments in the same order as in the -args argument in the call to codegen. [yhat1,melo1,mehi1] = glmval(b,x,link,stats,'Confidence',0.9); [yhat2,melo2,mehi2] = classifyGLM_mex(b,x,link,stats,'Confidence',0.9); comp1 = (yhat1 agree1 = comp1 comp2 = (melo1 agree2 = comp2 comp3 = (mehi1 agree3 = comp3 agree1 = logical 1 agree2 = logical 1 agree3 = logical 1

32-2448

< <
0 and theta = sigma/k, the GP is equivalent to a Pareto distribution with a scale parameter equal to sigma/k and a shape parameter equal to 1/k. The mean of the GP is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. When k ≥ 0, the GP has positive density for x > theta, or, when k < 0, 0 ≤

x−θ 1 ≤ − . σ k

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also cdf | gpfit | gpinv | gplike | gppdf | gprnd | gpstat Topics “Generalized Pareto Distribution” on page B-59 32-2467

32

Functions

“Working with Probability Distributions” on page 5-2 “Nonparametric and Empirical Probability Distributions” on page 5-29 “Fit a Nonparametric Distribution with Pareto Tails” on page 5-42 “Supported Distributions” on page 5-13 Introduced before R2006a

32-2468

gpfit

gpfit Generalized Pareto parameter estimates

Syntax parmhat = gpfit(x) [parmhat,parmci] = gpfit(x) [parmhat,parmci] = gpfit(x,alpha) [...] = gpfit(x,alpha,options)

Description parmhat = gpfit(x) returns maximum likelihood estimates of the parameters for the twoparameter generalized Pareto (GP) distribution given the data in x. parmhat(1) is the tail index (shape) parameter, k and parmhat(2) is the scale parameter, sigma. gpfit does not fit a threshold (location) parameter. [parmhat,parmci] = gpfit(x) returns 95% confidence intervals for the parameter estimates. [parmhat,parmci] = gpfit(x,alpha) returns 100(1-alpha)% confidence intervals for the parameter estimates. [...] = gpfit(x,alpha,options) specifies control parameters for the iterative algorithm used to compute ML estimates. This argument can be created by a call to statset. See statset('gpfit') for parameter names and default values. Other functions for the generalized Pareto, such as gpcdf allow a threshold parameter, theta. However, gpfit does not estimate theta. It is assumed to be known, and subtracted from x before calling gpfit. When k = 0 and theta = 0, the GP is equivalent to the exponential distribution. When k > 0 and theta = sigma/k, the GP is equivalent to a Pareto distribution with a scale parameter equal to sigma/k and a shape parameter equal to 1/k. The mean of the GP is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. When k ≥ 0, the GP has positive density for k > theta, or, when k < 0, for 0≤

x−θ 1 ≤ − σ k

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000. 32-2469

32

Functions

See Also gpcdf | gpinv | gplike | gppdf | gprnd | gpstat | mle Topics “Generalized Pareto Distribution” on page B-59 “Working with Probability Distributions” on page 5-2 “Nonparametric and Empirical Probability Distributions” on page 5-29 “Fit a Nonparametric Distribution with Pareto Tails” on page 5-42 “Supported Distributions” on page 5-13 Introduced before R2006a

32-2470

gpinv

gpinv Generalized Pareto inverse cumulative distribution function

Syntax x = gpinv(p,k,sigma,theta)

Description x = gpinv(p,k,sigma,theta) returns the inverse cdf for a generalized Pareto (GP) distribution with tail index (shape) parameter k, scale parameter sigma, and threshold (location) parameter theta, evaluated at the values in p. The size of x is the common size of the input arguments. A scalar input functions as a constant matrix of the same size as the other inputs. Default values for k, sigma, and theta are 0, 1, and 0, respectively. When k = 0 and theta = 0, the GP is equivalent to the exponential distribution. When k > 0 and theta = sigma/k, the GP is equivalent to a Pareto distribution with a scale parameter equal to sigma/k and a shape parameter equal to 1/k. The mean of the GP is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. When k ≥ 0, the GP has positive density for x > theta, or, when k < 0, 0 ≤

x−θ 1 ≤ − . σ k

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also gpcdf | gpfit | gplike | gppdf | gprnd | gpstat | icdf Topics “Generalized Pareto Distribution” on page B-59 “Working with Probability Distributions” on page 5-2 “Nonparametric and Empirical Probability Distributions” on page 5-29 “Fit a Nonparametric Distribution with Pareto Tails” on page 5-42 “Supported Distributions” on page 5-13 32-2471

32

Functions

Introduced before R2006a

32-2472

gplike

gplike Generalized Pareto negative loglikelihood

Syntax nlogL = gplike(params,data) [nlogL,acov] = gplike(params,data)

Description nlogL = gplike(params,data) returns the negative of the loglikelihood nlogL for the twoparameter generalized Pareto (GP) distribution, evaluated at parameters params. params(1) is the tail index (shape) parameter, k, and params(2) is the scale parameter. gplike does not allow a threshold (location) parameter. [nlogL,acov] = gplike(params,data) returns the inverse of Fisher's information matrix, acov. If the input parameter values in params are the maximum likelihood estimates, the diagonal elements of acov are their asymptotic variances. acov is based on the observed Fisher's information, not the expected information. When k = 0 and theta = 0, the GP is equivalent to the exponential distribution. When k > 0 and theta = sigma/k, the GP is equivalent to a Pareto distribution with a scale parameter equal to sigma/k and a shape parameter equal to 1/k. The mean of the GP is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. When k ≥ 0, the GP has positive density for x > theta, or, when k < 0, 0 ≤

x−θ 1 ≤ − . σ k

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

See Also gpcdf | gpfit | gpinv | gppdf | gprnd | gpstat Topics “Generalized Pareto Distribution” on page B-59 “Working with Probability Distributions” on page 5-2 “Nonparametric and Empirical Probability Distributions” on page 5-29 “Fit a Nonparametric Distribution with Pareto Tails” on page 5-42 “Supported Distributions” on page 5-13 32-2473

32

Functions

Introduced before R2006a

32-2474

gppdf

gppdf Generalized Pareto probability density function

Syntax p = gppdf(x,k,sigma,theta)

Description p = gppdf(x,k,sigma,theta) returns the pdf of the generalized Pareto (GP) distribution with the tail index (shape) parameter k, scale parameter sigma, and threshold (location) parameter, theta, evaluated at the values in x. The size of p is the common size of the input arguments. A scalar input functions as a constant matrix of the same size as the other inputs. Default values for k, sigma, and theta are 0, 1, and 0, respectively. When k = 0 and theta = 0, the GP is equivalent to the exponential distribution. When k > 0 and theta = sigma/k, the GP is equivalent to a Pareto distribution with a scale parameter equal to sigma/k and a shape parameter equal to 1/k. The mean of the GP is not finite when k ≥ 1, and the variance is not finite when k ≥ 1/2. When k ≥ 0, the GP has positive density for x > theta, or, when k < 0, 0 ≤

x−θ 1 ≤ − . σ k

References [1] Embrechts, P., C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. New York: Springer, 1997. [2] Kotz, S., and S. Nadarajah. Extreme Value Distributions: Theory and Applications. London: Imperial College Press, 2000.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also gpcdf | gpfit | gpinv | gplike | gprnd | gpstat | pdf Topics “Generalized Pareto Distribution” on page B-59 “Working with Probability Distributions” on page 5-2 “Nonparametric and Empirical Probability Distributions” on page 5-29 “Fit a Nonparametric Distribution with Pareto Tails” on page 5-42 “Supported Distributions” on page 5-13 32-2475

32

Functions

Introduced before R2006a

32-2476

gplotmatrix

gplotmatrix Matrix of scatter plots by group

Syntax gplotmatrix(X,[],group) gplotmatrix(X,Y,group) gplotmatrix(X,Y,group,clr,sym,siz) gplotmatrix(X,Y,group,clr,sym,siz,doleg) gplotmatrix(X,[],group,clr,sym,siz,doleg,dispopt) gplotmatrix(X,[],group,clr,sym,siz,doleg,dispopt,xnam) gplotmatrix(X,Y,group,clr,sym,siz,doleg,[],xnam,ynam) gplotmatrix(parent, ___ ) [h,ax,bigax] = gplotmatrix( ___ )

Description gplotmatrix(X,[],group) creates a matrix of scatter plots and histograms of the data in X, grouped by the grouping variable in group. Each off-diagonal plot in the resulting figure is a scatter plot of a column of X against another column of X. The software also plots the outlines of the grouped histograms in the diagonal plots of the plot matrix. X and group must have the same number of rows. gplotmatrix(X,Y,group) creates a matrix of scatter plots. Each plot in the resulting figure is a scatter plot of a column of X against a column of Y. For example, if X has p columns and Y has q columns, then the figure contains a q-by-p matrix of scatter plots. All plots are grouped by the grouping variable group. The input arguments X, Y, and group must all have the same number of rows. gplotmatrix(X,Y,group,clr,sym,siz) specifies the marker color clr, symbol sym, and size siz for each group. gplotmatrix(X,Y,group,clr,sym,siz,doleg) controls whether a legend is displayed in the figure. gplotmatrix creates a legend by default. gplotmatrix(X,[],group,clr,sym,siz,doleg,dispopt) controls the display options for the diagonal plots in the plot matrix of X. gplotmatrix(X,[],group,clr,sym,siz,doleg,dispopt,xnam) labels the x-axes and y-axes of the scatter plots using the column names specified in xnam. The input argument xnam must contain one name for each column of X. Set dispopt to 'variable' to display the variable names along the diagonal of the scatter plot matrix. gplotmatrix(X,Y,group,clr,sym,siz,doleg,[],xnam,ynam) labels the x-axes and y-axes of the scatter plots using the column names specified in xnam and ynam. The input arguments xnam and ynam must contain one name for each column of X and Y, respectively. gplotmatrix(parent, ___ ) creates the scatter plot matrix in the figure or panel specified by parent. Specify parent as the first input argument followed by any of the input argument combinations in the previous syntaxes. 32-2477

32

Functions

[h,ax,bigax] = gplotmatrix( ___ ) returns graphics handles to the individual plots and the entire scatter plot matrix. You can pass in [] for clr, sym, siz, doleg, and dispopt to use their default values.

Examples Scatter Plots with Grouped Data Create a matrix of scatter plots for each combination of variables in a data set. Group the data according to a separate variable. Load the fisheriris data set, which contains flower data. The four columns of meas are the sepal length, sepal width, petal length, and petal width of the flowers. species contains the flower species names: setosa, versicolor, and virginica. Visually compare the flower measurements across flower species. load fisheriris gplotmatrix(meas,[],species)

In the matrix of scatter plots, the x-axis of the leftmost column of scatter plots corresponds to sepal length, the first column in meas. Similarly, the y-axis of the bottom row of scatter plots corresponds to petal width, the last column in meas. Therefore, the scatter plot in the bottom left of the matrix compares sepal length values (along the x-axis) to petal width values (along the y-axis). The color of each point depends on the species of the flower. 32-2478

gplotmatrix

The diagonal plots are histograms rather than scatter plots. For example, the plot in the top left of the matrix shows the distribution of sepal length values for each species of flower.

Create Scatter Plot Matrix with Subset of Variables Create scatter plots comparing a subset of the variables in a data set to another subset of variables. Group the data according to a separate variable. Load the discrim data set. load discrim

The ratings array contains rating values of 329 US cities for the nine categories listed in the categories array. The group array contains a city size code that is equal to 2 for the 26 largest cities, and 1 otherwise. Create a matrix of scatter plots to compare the first two categories, climate and housing, with the fourth and seventh categories, crime and arts. Specify group as the grouping variable to visually distinguish the data for large and small cities. X = ratings(:,1:2); Y = ratings(:,[4 7]); gplotmatrix(X,Y,group)

32-2479

32

Functions

The matrix of scatter plots shows the specified comparisons, with each city size group represented by a different color. Adjust the appearance of the plots by specifying marker colors and symbols, and labeling the axes with the rating categories. xnames = categories(1:2,:); ynames = categories([4 7],:); gplotmatrix(X,Y,group,'br','.o',[],'on',[],xnames,ynames)

Scatter Plot Matrix with Multiple Grouping Variables Create a matrix of scatter plots comparing data variables by using two grouping variables. Load the patients data set. Compare patient diastolic and systolic blood pressure values. Group the patients according to their gender and smoker status. Convert Smoker to a categorical variable to have more descriptive labels in the legend. Display grouped histograms along the diagonal of the plot matrix by using the 'grpbars' display option, and label the axes. load patients X = [Diastolic Systolic]; labeledSmoker = categorical(Smoker,[true false],{'Smoker','Nonsmoker'}); group = {Gender,labeledSmoker}; color = lines(4)

32-2480

gplotmatrix

color = 4×3 0 0.8500 0.9290 0.4940

0.4470 0.3250 0.6940 0.1840

0.7410 0.0980 0.1250 0.5560

xnames = {'Diastolic','Systolic'}; gplotmatrix(X,[],group,color,[],[],[],'grpbars',xnames)

For example, the scatter plot in the bottom left of the matrix shows that smokers (blue and yellow markers) tend to have higher diastolic and systolic blood pressure values, regardless of gender.

Modify Scatter Plot Matrix Appearance Create a matrix of scatter plots that display grouped data. Modify the appearance of one of the scatter plots. Load the carsmall data set. Create a scatter plot matrix using different car measurements. Group the cars by the number of cylinders. Specify the group colors, and display the car variable names along the diagonal of the plot matrix. Add a title to the plot matrix.

32-2481

32

Functions

load carsmall X = [Acceleration Displacement Horsepower MPG Weight]; color = lines(3) color = 3×3 0 0.8500 0.9290

0.4470 0.3250 0.6940

0.7410 0.0980 0.1250

xnames = {'Acceleration','Displacement','Horsepower','MPG','Weight'}; [h,ax] = gplotmatrix(X,[],Cylinders,color,[],[],[],'variable',xnames); title('Car Data')

Change the appearance of the scatter plot in the bottom left of the matrix by using h and ax. First, change the colors of the data points in the scatter plot. Then, add grid lines to the scatter plot. bottomleftPlot = h(5,1,:); bottomleftPlot(1).Color = 'blue'; bottomleftPlot(2).Color = 'red'; bottomleftPlot(3).Color = 'yellow'; bottomleftAxes = ax(5,1); bottomleftAxes.XGrid = 'on'; bottomleftAxes.YGrid = 'on';

32-2482

gplotmatrix

Input Arguments X — Input data numeric matrix | datetime array | duration array Input data, specified as an n-by-p numeric matrix, datetime array, or duration array. gplotmatrix creates a matrix of plots using the columns of X. If you do not specify an additional input matrix Y, then gplotmatrix creates a p-by-p matrix of plots. The off-diagonal plots are scatter plots, and the diagonal plots depend on the value of dispopt. In each scatter plot, gplotmatrix plots one column of X against another column of X. The points in the scatter plots are grouped according to group. If you specify Y, then gplotmatrix creates a q-by-p matrix of scatter plots using the p columns of X and the q columns of Y. Data Types: single | double | datetime | duration Y — Input data numeric matrix | datetime array | duration array Input data, specified as an n-by-q numeric matrix, datetime array, or duration array. gplotmatrix creates a q-by-p matrix of scatter plots using the p columns of X and the q columns of Y. For each column of the plot matrix, the x-axis values of the scatter plots are the same as the values in the corresponding column of X. Similarly, for each row of the plot matrix, the y-axis values of the scatter plots are the same as the values in the corresponding column of Y. The points in the scatter plots are grouped according to group. 32-2483

32

Functions

X and Y must have the same number of rows. Data Types: single | double | datetime | duration group — Grouping variable categorical vector | numeric vector | logical vector | character array | string array | cell array Grouping variable, specified as a categorical vector, numeric vector, logical vector, character array, string array, or cell array of character vectors. Alternatively, group can be a cell array containing several grouping variables (such as {g1 g2 g3}), in which case observations are in the same group if they have common values of all grouping variables. In any case, group must have the same number of rows as X. Points in the same group appear on the graph with the same marker color, symbol, and size. Example: categorical({'blue','red','yellow','blue','yellow','red','red','yellow','blue ','red'}) Example: {Smoker,Gender} where Smoker and Gender are grouping variables Data Types: categorical | single | double | logical | char | string | cell clr — Marker colors character vector | string scalar | string array | cell array of character vectors | matrix of RGB values Marker colors, specified as one of the following: • Character vector or string scalar of color short names. • String array or cell array of character vectors designating color names or short names. • Three-column matrix of RGB values in the range [0,1]. The three columns represent the R (red) value, G (green) value, and B (blue) value. You can choose among these predefined colors and their equivalent RGB triplets. Color Name

Short Name

RGB Triplet

'red'

'r'

[1 0 0]

'green'

'g'

[0 1 0]

'blue'

'b'

[0 0 1]

'cyan'

'c'

[0 1 1]

'magenta'

'm'

[1 0 1]

'yellow'

'y'

[1 1 0]

'black'

'k'

[0 0 0]

'white'

'w'

[1 1 1]

Appearance

When the total number of groups exceeds the number of specified colors, gplotmatrix cycles through the specified colors. Example: {'blue','black','green'} Example: [0 0 1; 0 0.5 0.5; 0.5 0.5 0.5] Data Types: char | string | cell | single | double 32-2484

gplotmatrix

sym — Marker symbols '.' (default) | character vector | string scalar Marker symbols, specified as a character vector or string scalar. You can choose among these marker options. Value

Description

'o'

Circle

'+'

Plus sign

'*'

Asterisk

'.'

Point

'x'

Cross

's'

Square

'd'

Diamond

'^'

Upward-pointing triangle

'v'

Downward-pointing triangle

'>'

Right-pointing triangle

' h2

Description h1 > h2 performs element-wise comparisons between handle arrays h1 and h2. h1 and h2 must be of the same dimensions unless one is a scalar. The result is a logical array of the same dimensions, where each element is an element-wise > result. If one of h1 or h2 is scalar, scalar expansion is performed and the result will match the dimensions of the array that is not scalar. tf = gt(h1, h2) stores the result in a logical array of the same dimensions.

See Also eq | ge | le | lt | ne | qrandstream

32-2526

haltonset

haltonset Halton quasirandom point set

Description haltonset is a quasirandom point set object that produces points from the Halton sequence. The Halton sequence uses different prime bases in each dimension to fill space in a highly uniform manner.

Creation Syntax p = haltonset(d) p = haltonset(d,Name,Value) Description p = haltonset(d) constructs a d-dimensional point set p, which is a haltonset object with default property settings. The input argument d corresponds to the Dimensions property of p. p = haltonset(d,Name,Value) sets properties on page 32-2527 of p using one or more namevalue pair arguments. Enclose each property name in quotes. For example, haltonset(5,'Leap',2) creates a five-dimensional point set from the first point, fourth point, seventh point, tenth point, and so on. The returned object p encapsulates properties of a Halton quasirandom sequence. The point set is finite, with a length determined by the Skip and Leap properties and by limits on the size of the point set indices (maximum value of 253). Values of the point set are generated whenever you access p using net or parenthesis indexing. Values are not stored within p.

Properties Dimensions — Number of dimensions positive integer scalar This property is read-only. Number of dimensions of the points in the point set, specified as a positive integer scalar. For example, each point in the point set p with p.Dimensions = 5 has five values. Use the d input argument to specify the number of dimensions when you create a point set using the haltonset function. Leap — Interval between points 0 (default) | positive integer scalar 32-2527

32

Functions

Interval between points in the sequence, specified as a positive integer scalar. In other words, the Leap property of a point set specifies the number of points in the sequence to leap over and omit for every point taken. The default Leap value is 0, which corresponds to taking every point from the sequence. Leaping is a technique used to improve the quality of a point set. However, you must choose the Leap values with care. Many Leap values create sequences that fail to touch on large sub-hyper-rectangles of the unit hypercube and, therefore, fail to be a uniform quasirandom point set. For more information, see [1]. One rule for choosing Leap values for Halton sets is to set the value to (n–1), where n is a prime number that has not been used to generate one of the dimensions. For example, for a d-dimensional point set, specify the (d+1)th or greater prime number for n. Example: p = haltonset(2,'Leap',4); (where d = 2 and n = 5) Example: p.Leap = 100; ScrambleMethod — Settings that control scrambling 0x0 structure (default) | structure with Type and Options fields Settings that control the scrambling of the sequence, specified as a structure with these fields: • Type — A character vector containing the name of the scramble • Options — A cell array of parameter values for the scramble Use the scramble object function to set scrambles. For a list of valid scramble types, see the type input argument of scramble. An error occurs if you set an invalid scramble type for a given point set. The ScrambleMethod property also accepts an empty matrix as a value. The software then clears all scrambling and sets the property to contain a 0x0 structure. Skip — Number of initial points in sequence to omit 0 (default) | positive integer scalar Number of initial points in the sequence to omit from the point set, specified as a positive integer scalar. Initial points of a sequence sometimes exhibit undesirable properties. For example, the first point is often (0,0,0,...), which can cause the sequence to be unbalanced because the counterpart of the point, (1,1,1,...), never appears. Also, initial points often exhibit correlations among different dimensions, and these correlations disappear later in the sequence. Example: p = haltonset(__,'Skip',2e3); Example: p.Skip = 1e3; Type — Sequence type 'Halton' (default) This property is read-only. Sequence type on which the quasirandom point set p is based, specified as 'Halton'. 32-2528

haltonset

Object Functions net scramble

Generate quasirandom point set Scramble quasirandom point set

You can also use the following MATLAB functions with a haltonset object. The software treats the point set object like a matrix of multidimensional points. length Length of largest array dimension size Array size

Examples Create Halton Point Set Generate a three-dimensional Halton point set, skip the first 1000 values, and then retain every 101st point. p = haltonset(3,'Skip',1e3,'Leap',1e2) p = Halton point set in 3 dimensions (89180190640991 points) Properties: Skip : 1000 Leap : 100 ScrambleMethod : none

Apply reverse-radix scrambling by using scramble. p = scramble(p,'RR2') p = Halton point set in 3 dimensions (89180190640991 points) Properties: Skip : 1000 Leap : 100 ScrambleMethod : RR2

Generate the first four points by using net. X0 = net(p,4) X0 = 4×3 0.0928 0.6958 0.3013 0.9087

0.6950 0.2958 0.6497 0.7883

0.0029 0.8269 0.4141 0.2166

Generate every third point, up to the eleventh point, by using parenthesis indexing. X = p(1:3:11,:)

32-2529

32

Functions

X = 4×3 0.0928 0.9087 0.3843 0.6831

0.6950 0.7883 0.9840 0.7357

0.0029 0.2166 0.9878 0.7923

Tips • The Skip and Leap properties are useful for parallel applications. For example, if you have a Parallel Computing Toolbox license, you can partition a sequence of points across N different workers by using the function labindex. On each nth worker, set the Skip property of the point set to n – 1 and the Leap property to N – 1. The following code shows how to partition a sequence across three workers. Nworkers = 3; p = haltonset(10,'Leap',Nworkers-1); spmd(Nworkers) p.Skip = labindex - 1; % Compute something using points 1,4,7... % or points 2,5,8... or points 3,6,9... end

Algorithms Halton Sequence Generation Consider a default haltonset object p that contains d-dimensional points. Each p(i,:) is a point in a Halton sequence. The jth coordinate of the point, p(i,j), is equal to

∑ ai j(k)b j−k − 1 . k

• bj is the jth prime. • The ai j(k) coefficients are nonnegative integers less than bj such that i−1=



k=0

ai j(k)b jk .

In other words, the ai j(k) values are the base bj digits of the integer i – 1. For more information, see [1].

References [1] Kocis, L., and W. J. Whiten. “Computational Investigations of Low-Discrepancy Sequences.” ACM Transactions on Mathematical Software. Vol. 23, No. 2, 1997, pp. 266–294.

See Also net | scramble | sobolset 32-2530

haltonset

Topics “Generating Quasi-Random Numbers” on page 7-13 Introduced in R2008a

32-2531

32

Functions

harmmean Harmonic mean

Syntax m m m m m

= = = = =

harmmean(X) harmmean(X,'all') harmmean(X,dim) harmmean(X,vecdim) harmmean( ___ ,nanflag)

Description m = harmmean(X) calculates the harmonic mean on page 32-2536 of a sample. For vectors, harmmean(X) is the harmonic mean of the elements in X. For matrices, harmmean(X) is a row vector containing the harmonic means of each column. For N-dimensional arrays, harmmean operates along the first nonsingleton dimension of X. m = harmmean(X,'all') returns the harmonic mean of all elements of X. m = harmmean(X,dim) takes the harmonic mean along the operating dimension dim of X. m = harmmean(X,vecdim) returns the harmonic mean over the dimensions specified in the vector vecdim. Each element of vecdim represents a dimension of the input array X. The output m has length 1 in the specified operating dimensions. The other dimension lengths are the same for X and m. For example, if X is a 2-by-3-by-4 array, then harmmean(X,[1 2]) returns a 1-by-1-by-4 array. Each element of the output array is the harmonic mean of the elements on the corresponding page of X. m = harmmean( ___ ,nanflag) specifies whether to exclude NaN values from the calculation, using any of the input argument combinations in previous syntaxes. By default, harmmean includes NaN values in the calculation (nanflag has the value 'includenan'). To exclude NaN values, set the value of nanflag to 'omitnan'.

Examples Compare Harmonic and Arithmetic Mean Set the random seed for reproducibility of the results. rng('default')

Create a matrix of exponential random numbers with 5 rows and 4 columns. X = exprnd(1,5,4) X = 5×4 0.2049 0.0989 2.0637

32-2532

2.3275 1.2783 0.6035

1.8476 0.0298 0.0438

1.9527 0.8633 0.0880

harmmean

0.0906 0.4583

0.0434 0.0357

0.7228 0.2228

0.2329 0.0414

Compute the harmonic and arithmetic means of the columns of X. harmonic = harmmean(X) harmonic = 1×4 0.1743

0.0928

0.0797

0.1205

0.5734

0.6357

arithmetic = mean(X) arithmetic = 1×4 0.5833

0.8577

The arithmetic mean is greater than the harmonic mean for all the columns of X.

Harmonic Mean of All Values Find the harmonic mean of all the values in an array. Create a 3-by-5-by-2 array X. X = reshape(1:30,[3 5 2]) X = X(:,:,1) = 1 2 3

4 5 6

7 8 9

10 11 12

13 14 15

19 20 21

22 23 24

25 26 27

28 29 30

X(:,:,2) = 16 17 18

Find the harmonic mean of the elements of X. m = harmmean(X,'all') m = 7.5094

32-2533

32

Functions

Harmonic Mean Along Specified Dimensions Find the harmonic mean along different operating dimensions and vectors of dimensions for a multidimensional array. Create a 3-by-5-by-2 array X. X = reshape(1:30,[3 5 2]) X = X(:,:,1) = 1 2 3

4 5 6

7 8 9

10 11 12

13 14 15

19 20 21

22 23 24

25 26 27

28 29 30

X(:,:,2) = 16 17 18

Find the harmonic mean of X along the default dimension. hmean1 = harmmean(X) hmean1 = hmean1(:,:,1) = 1.6364

4.8649

7.9162

10.9392

13.9523

22.9710

25.9743

28.9770

hmean1(:,:,2) = 16.9607

19.9666

By default, harmmean operates along the first dimension of X whose size does not equal 1. In this case, this dimension is the first dimension of X. Therefore, hmean1 is a 1-by-5-by-2 array. Find the harmonic mean of X along the second dimension. hmean2 = harmmean(X,2) hmean2 = hmean2(:,:,1) = 3.1852 5.0641 6.5693 hmean2(:,:,2) = 21.1595 22.1979

32-2534

harmmean

23.2329

hmean2 is a 3-by-1-by-2 array. Find the harmonic mean of X along the third dimension. hmean3 = harmmean(X,3) hmean3 = 3×5 1.8824 3.5789 5.1429

6.6087 8.0000 9.3333

10.6207 11.8710 13.0909

14.2857 15.4595 16.6154

17.7561 18.8837 20.0000

hmean3 is a 3-by-5 array. Find the harmonic mean of each page of X by specifying the first and second dimensions using the vecdim input argument. mpage = harmmean(X,[1 2]) mpage = mpage(:,:,1) = 4.5205 mpage(:,:,2) = 22.1645

For example, mpage(1,1,2) is the harmonic mean of the elements in X(:,:,2). Find the harmonic mean of the elements in each X(i,:,:) slice by specifying the second and third dimensions. mrow = harmmean(X,[2 3]) mrow = 3×1 5.5369 8.2469 10.2425

For example, mrow(3) is the harmonic mean of the elements in X(3,:,:) and is equivalent to specifying harmmean(X(3,:,:),'all').

Harmonic Mean Excluding NaN Create a vector and compute its harmmean, excluding NaN values. 32-2535

32

Functions

x = 1:10; x(3) = nan; % Replace the third element of x with a NaN value n = harmmean(x,'omitnan') n = 3.4674

If you do not specify 'omitnan', then harmmean(x) returns NaN.

More About Harmonic Mean The harmonic mean of a sample X is m=

n n

1 x i=1 i



where n is the number of values in X.

Tips • When harmmean computes the harmonic mean of an array containing 0, the returned value is 0.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB). C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • These input arguments are not supported: 'all', vecdim, and nanflag. • The dim input argument must be a compile-time constant. • If you do not specify the dim input argument, the working (or operating) dimension can be different in the generated code. As a result, run-time errors can occur. For more details, see “Automatic dimension restriction” (MATLAB Coder). For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The 'all' and vecdim input arguments are not supported. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 32-2536

harmmean

See Also geomean | mean | median | trimmean Introduced before R2006a

32-2537

32

Functions

hist3 Bivariate histogram plot

Syntax hist3(X) hist3(X,'Nbins',nbins) hist3(X,'Ctrs',ctrs) hist3(X,'Edges',edges) hist3( ___ ,Name,Value) hist3(ax, ___ ) N = hist3( ___ ) [N,c] = hist3( ___ )

Description hist3(X) creates a bivariate histogram plot of X(:,1) and X(:,2) using 10-by-10 equally spaced bins. The hist3 function displays the bins as 3-D rectangular bars, and the height of each bar indicates the number of elements in the bin. hist3(X,'Nbins',nbins) specifies the number of bins in each dimension of the histogram. This syntax is equivalent to hist3(X,nbins). hist3(X,'Ctrs',ctrs) specifies the centers of the bins in each dimension of the histogram. This syntax is equivalent to hist3(X,ctrs). hist3(X,'Edges',edges) specifies the edges of the bins in each dimension. hist3( ___ ,Name,Value) specifies graphical properties using one or more name-value pair arguments in addition to the input arguments in the previous syntaxes. For example, 'FaceAlpha',0.5 creates a semitransparent histogram. For a list of properties, see Surface Properties. hist3(ax, ___ ) plots into the axes specified by ax instead of the current axes (gca). The option ax can precede any of the input argument combinations in the previous syntaxes. N = hist3( ___ ) returns the number of elements in X that fall in each bin. This syntax does not create a histogram. [N,c] = hist3( ___ ) also returns the bin centers. This syntax does not create a histogram.

Examples Histogram of Vectors Load the sample data. load carbig

32-2538

hist3

Create a bivariate histogram with the default settings. X = [MPG,Weight]; hist3(X) xlabel('MPG') ylabel('Weight')

Specify Centers of Histogram Bins Create a bivariate histogram on the bins specified by the bin centers, and count the number of elements in each bin. Load the sample data. load carbig

Create a bivariate histogram. Specify the centers of the histogram bins using a two-element cell array. X = [MPG,Weight]; hist3(X,'Ctrs',{0:10:50 2000:500:5000}) xlabel('MPG') ylabel('Weight')

32-2539

32

Functions

Count the number of elements in each bin. N = hist3(X,'Ctrs',{0:10:50 2000:500:5000}) N = 6×7 0 0 6 70 29 1

0 0 34 49 4 0

0 2 50 11 2 0

0 3 49 3 0 0

0 16 27 0 0 0

0 26 10 0 0 0

0 6 0 0 0 0

Color Histogram Bars by Height Load the sample data. load carbig

Create a bivariate histogram. Specify graphical properties to color the histogram bars by height representing the frequency of the observations. X = [MPG,Weight]; hist3(X,'CDataMode','auto','FaceColor','interp')

32-2540

hist3

xlabel('MPG') ylabel('Weight')

Tiled Histogram View Load the sample data. load carbig

Create a bivariate tiled histogram. Specify graphical properties to color the top surface of the histogram bars by the frequency of the observations. Change the view to two-dimensional. X = [MPG,Weight]; hist3(X,'CdataMode','auto') xlabel('MPG') ylabel('Weight') colorbar view(2)

32-2541

32

Functions

Adjust Graphical Properties Create a bivariate histogram and adjust its graphical properties by using the handle of the histogram surface object. Load the sample data. load carbig

Create a bivariate histogram with 7 bins in each dimension. X = [MPG,Weight]; hist3(X,'Nbins',[7 7]) xlabel('MPG') ylabel('Weight')

32-2542

hist3

The hist3 function creates a bivariate histogram, which is a type of surface plot. Find the handle of the surface object and adjust the face transparency. s = findobj(gca,'Type','Surface'); s.FaceAlpha = 0.65;

32-2543

32

Functions

Plot Histogram with Intensity Map Create a bivariate histogram and add the 2-D projected view of intensities to the histogram. Load the seamount data set (a seamount is an underwater mountain). The data set consists of a set of longitude (x) and latitude (y) locations, and the corresponding seamount elevations (z) measured at those coordinates. This example uses x and y to draw a bivariate histogram. load seamount

Draw a bivariate histogram. hist3([x,y]) xlabel('Longitude') ylabel('Latitude') hold on

32-2544

hist3

Count the number of elements in each bin. N = hist3([x,y]);

Generate a grid to draw the 2-D projected view of intensities by using pcolor. N_pcolor = N'; N_pcolor(size(N_pcolor,1)+1,size(N_pcolor,2)+1) = 0; xl = linspace(min(x),max(x),size(N_pcolor,2)); % Columns of N_pcolor yl = linspace(min(y),max(y),size(N_pcolor,1)); % Rows of N_pcolor

Draw the intensity map by using pcolor. Set the z-level of the intensity map to view the histogram and the intensity map together. h = pcolor(xl,yl,N_pcolor); colormap('hot') % Change color scheme colorbar % Display colorbar h.ZData = -max(N_pcolor(:))*ones(size(N_pcolor)); ax = gca; ax.ZTick(ax.ZTick < 0) = []; title('Seamount Location Histogram and Intensity Map');

32-2545

32

Functions

Input Arguments X — Data to distribute among bins m-by-2 numeric matrix Data to distribute among the bins, specified as an m-by-2 numeric matrix, where m is the number of data points. Corresponding elements in X(:,1) and X(:,2) specify the x and y coordinates of 2-D data points. hist3 ignores all NaN values. Similarly, hist3 ignores Inf and –Inf values unless you explicitly specify Inf or –Inf as a bin edge by using the edges input argument. Data Types: single | double nbins — Number of bins [10 10] (default) | two-element vector of positive integers Number of bins in each dimension, specified as a two-element vector of positive integers. nbins(1) specifies the number of bins in the first dimension, and nbins(2) specifies the number of bins in the second dimension. Example: [10 20] Data Types: single | double

32-2546

hist3

ctrs — Bin centers two-element cell array of numeric vectors Bin centers in each dimension, specified as a two-element cell array of numeric vectors with monotonically nondecreasing values. ctrs{1} and ctrs{2} are the positions of the bin centers in the first and second dimensions, respectively. hist3 assigns rows of X falling outside the range of the grid to the bins along the outer edges of the grid. Example: {0:10:100 0:50:500} Data Types: cell edges — Bin edges two-element cell array of numeric vectors Bin edges in each dimension, specified as a two-element cell array of numeric vectors with monotonically nondecreasing values. edges{1} and edges{2} are the positions of the bin edges in the first and second dimensions, respectively. The value X(k,:) is in the (i,j)th bin if edges{1}(i) ≤ X(k,1) < edges{1}(i+1) and edges{2}(j) ≤ X(k,2) < edges{2}(j+1). The last bins in each dimension also include the last (outer) edge. For example, X(k,:) falls into the (I,j)th bin if edges{1}(I–1) ≤ X(k,1) ≤ edges{1}(I) and edges{2}(j) ≤ X(k,2) < edges{2}(j+1), where I is the length of edges{1}. Also, X(k,:) falls into the (i,J)th bin if edges{1}(i) ≤ X(k,1) < edges{1}(i+1) and edges{2}(J–1) ≤ X(k,2) ≤ edges{2}(J), where J is the length of edges{2}. hist3 does not count rows of X falling outside the range of the grid. Use –Inf and Inf in edges to include all non-NaN values. Example: {0:10:100 0:50:500} Data Types: cell ax — Target axes current axes (gca) (default) | Axes object Target axes, specified as an axes object. If you do not specify an Axes object, then the hist3 function uses the current axes (gca). For details, see Axes Properties. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: hist3(X,'FaceColor','interp','CDataMode','auto') colors the histogram bars according to the height of the bars. The graphical properties listed here are only a subset. For a full list, see Surface Properties. CDataMode — Selection mode for vertex colors 'manual' (default) | 'auto' 32-2547

32

Functions

Selection mode for CData (vertex colors), specified as the comma-separated pair consisting of 'CDataMode' and one of these values: • 'manual' — Use manually specified values in the CData property. The default color in CData is light steel blue corresponding to an RGB triple value of [0.75 0.85 0.95]. • 'auto' — Use the ZData values to set the colors. ZData contains the z-coordinate data for the eight corners of each bar. Example: 'CDataMode','auto' EdgeColor — Edge line color [0 0 0] (default) | 'none' | 'flat' | 'interp' | RGB triplet | hexadecimal color code | color name | short name Edge line color, specified as the comma-separated pair consisting of 'EdgeColor' and one of these values: • 'none' — Do not draw the edges. • 'flat' — Use a different color for each edge based on the values in the CData property. • 'interp' — Use interpolated coloring for each edge based on the values in the CData property. • RGB triplet, hexadecimal color code, color name, or short name — Use the specified color for all the edges. These values do not use the color values in the CData property. The default color of [0 0 0] corresponds to black edges. RGB triplets and hexadecimal color codes are useful for specifying custom colors. • An RGB triplet is a three-element row vector whose elements specify the intensities of the red, green, and blue components of the color. The intensities must be in the range [0,1]; for example, [0.4 0.6 0.7]. • A hexadecimal color code is a character vector or a string scalar that starts with a hash symbol (#) followed by three or six hexadecimal digits, which can range from 0 to F. The values are not case sensitive. Thus, the color codes '#FF8800', '#ff8800', '#F80', and '#f80' are equivalent. Alternatively, you can specify some common colors by name. This table lists the named color options, the equivalent RGB triplets, and hexadecimal color codes. Color Name

Short Name

RGB Triplet

Hexadecimal Color Code

'red'

'r'

[1 0 0]

'#FF0000'

'green'

'g'

[0 1 0]

'#00FF00'

'blue'

'b'

[0 0 1]

'#0000FF'

'cyan'

'c'

[0 1 1]

'#00FFFF'

'magenta'

'm'

[1 0 1]

'#FF00FF'

'yellow'

'y'

[1 1 0]

'#FFFF00'

'black'

'k'

[0 0 0]

'#000000'

'white'

'w'

[1 1 1]

'#FFFFFF'

Appearance

Here are the RGB triplets and hexadecimal color codes for the default colors MATLAB uses in many types of plots. 32-2548

hist3

RGB Triplet

Hexadecimal Color Code

[0 0.4470 0.7410]

'#0072BD'

[0.8500 0.3250 0.0980]

'#D95319'

[0.9290 0.6940 0.1250]

'#EDB120'

[0.4940 0.1840 0.5560]

'#7E2F8E'

[0.4660 0.6740 0.1880]

'#77AC30'

[0.3010 0.7450 0.9330]

'#4DBEEE'

[0.6350 0.0780 0.1840]

'#A2142F'

Appearance

Example: 'EdgeColor','blue' FaceAlpha — Face transparency 1 (default) | scalar in the range [0,1] | 'flat' | 'interp' | 'texturemap' Face transparency, specified as the comma-separated pair consisting of 'FaceAlpha' and one of these values: • Scalar in the range [0,1] — Use uniform transparency across all the faces. A value of 1 is fully opaque and 0 is completely transparent. Values between 0 and 1 are semitransparent. This option does not use the transparency values in the AlphaData property. • 'flat' — Use a different transparency for each face based on the values in the AlphaData property. The transparency value at the first vertex determines the transparency for the entire face. This value applies only when you specify the AlphaData property and set the FaceColor property to 'flat'. • 'interp' — Use interpolated transparency for each face based on the values in the AlphaData property. The transparency varies across each face by interpolating the values at the vertices. This value applies only when you specify the AlphaData property and set the FaceColor property to 'interp'. • 'texturemap' — Transform the data in AlphaData so that it conforms to the surface. Example: 'FaceAlpha',0.5 FaceColor — Face color 'flat' (default) | 'interp' | 'none' | 'texturemap' | RGB triplet | hexadecimal color code | color name | short name Face color, specified as the comma-separated pair consisting of 'FaceColor' and one of these values: • 'flat' — Use a different color for each face based on the values in the CData property. • 'interp' — Use interpolated coloring for each face based on the values in the CData property. • 'none' — Do not draw the faces. • 'texturemap' — Transform the color data in CData so that it conforms to the surface. • RGB triplet, hexadecimal color code, color name, or short name — Use the specified color for all the faces. These values do not use the color values in the CData property. RGB triplets and hexadecimal color codes are useful for specifying custom colors. 32-2549

32

Functions

• An RGB triplet is a three-element row vector whose elements specify the intensities of the red, green, and blue components of the color. The intensities must be in the range [0,1]; for example, [0.4 0.6 0.7]. • A hexadecimal color code is a character vector or a string scalar that starts with a hash symbol (#) followed by three or six hexadecimal digits, which can range from 0 to F. The values are not case sensitive. Thus, the color codes '#FF8800', '#ff8800', '#F80', and '#f80' are equivalent. Alternatively, you can specify some common colors by name. This table lists the named color options, the equivalent RGB triplets, and hexadecimal color codes. Color Name

Short Name

RGB Triplet

Hexadecimal Color Code

'red'

'r'

[1 0 0]

'#FF0000'

'green'

'g'

[0 1 0]

'#00FF00'

'blue'

'b'

[0 0 1]

'#0000FF'

'cyan'

'c'

[0 1 1]

'#00FFFF'

'magenta'

'm'

[1 0 1]

'#FF00FF'

'yellow'

'y'

[1 1 0]

'#FFFF00'

'black'

'k'

[0 0 0]

'#000000'

'white'

'w'

[1 1 1]

'#FFFFFF'

Appearance

Here are the RGB triplets and hexadecimal color codes for the default colors MATLAB uses in many types of plots. RGB Triplet

Hexadecimal Color Code

[0 0.4470 0.7410]

'#0072BD'

[0.8500 0.3250 0.0980]

'#D95319'

[0.9290 0.6940 0.1250]

'#EDB120'

[0.4940 0.1840 0.5560]

'#7E2F8E'

[0.4660 0.6740 0.1880]

'#77AC30'

[0.3010 0.7450 0.9330]

'#4DBEEE'

[0.6350 0.0780 0.1840]

'#A2142F'

Appearance

Example: 'FaceColor','interp' LineStyle — Line style '-' (default) | '--' | ':' | '-.' | 'none' Line style, specified as the comma-separated pair consisting of 'LineStyle' and one of the options in this table.

32-2550

Line Style

Description

'-'

Solid line

'--'

Dashed line

Resulting Line

hist3

Line Style

Description

':'

Dotted line

'-.'

Dash-dotted line

'none'

No line

Resulting Line

No line

Example: 'LineStyle',':' LineWidth — Line width 0.5 (default) | positive value Line width, specified as the comma-separated pair consisting of 'LineWidth' and a positive value in points. Example: 'LineWidth',0.75 Data Types: single | double

Output Arguments N — Number of elements in each bin numeric matrix Number of elements in X that fall in each bin, returned as a numeric matrix. c — Bin centers two-element cell array of numeric vectors Bin centers in each dimension, returned as a two-element cell array of numeric vectors. c{1} and c{2} are the positions of the bin centers in the first and second dimensions, respectively.

Tips The hist3 function creates a bivariate histogram, which is a type of surface plot. You can specify surface properties using one or more name-value pair arguments. Also, you can change the appearance of the histogram by changing the surface property values after you create a histogram. Get the handle of the surface object by using s = findobj(gca,'Type','Surface'), and then use s to modify the surface properties. For an example, see “Adjust Graphical Properties” on page 322542. For a list of properties, see Surface Properties.

Alternative Functionality The histogram2 function enables you to create a bivariate histogram using a Histogram2 object. You can use the name-value pair arguments of histogram2 to use normalization ('Normalization'), adjust the width of the bins in each dimension ('BinWidth'), and display the histogram as a rectangular array of tiles instead of 3-D bars ('DisplayStyle').

See Also accumarray | bar3 | binScatterPlot | histcounts2 | histogram2 32-2551

32

Functions

Introduced before R2006a

32-2552

histfit

histfit Histogram with a distribution fit

Syntax histfit(data) histfit(data,nbins) histfit(data,nbins,dist) histfit(ax, ___ ) h = histfit( ___ )

Description histfit(data) plots a histogram of values in data using the number of bins equal to the square root of the number of elements in data and fits a normal density function. histfit(data,nbins) plots a histogram using nbins bins and fits a normal density function. histfit(data,nbins,dist) plots a histogram with nbins bins and fits a density function from the distribution specified by dist. histfit(ax, ___ ) uses the plot axes specified by the Axes object ax. Specify ax as the first input argument followed by any of the input argument combinations in the previous syntaxes. h = histfit( ___ ) returns a vector of handles h, where h(1) is the handle to the histogram and h(2) is the handle to the density curve.

Examples Histogram with a Normal Distribution Fit Generate a sample of size 100 from a normal distribution with mean 10 and variance 1. rng default; % For reproducibility r = normrnd(10,1,100,1);

Construct a histogram with a normal distribution fit. histfit(r)

32-2553

32

Functions

histfit uses fitdist to fit a distribution to data. Use fitdist to obtain parameters used in fitting. pd = fitdist(r,'Normal') pd = NormalDistribution Normal distribution mu = 10.1231 [9.89244, 10.3537] sigma = 1.1624 [1.02059, 1.35033]

The intervals next to the parameter estimates are the 95% confidence intervals for the distribution parameters.

Histogram for a Given Number of Bins Generate a sample of size 100 from a normal distribution with mean 10 and variance 1. rng default; % For reproducibility r = normrnd(10,1,100,1);

Construct a histogram using six bins with a normal distribution fit. 32-2554

histfit

histfit(r,6)

Histogram with a Specified Distribution Fit Generate a sample of size 100 from a beta distribution with parameters (3,10). rng default; % For reproducibility b = betarnd(3,10,100,1);

Construct a histogram using 10 bins with a beta distribution fit. histfit(b,10,'beta')

32-2555

32

Functions

Histogram with a Kernel Smoothing Function Fit Generate a sample of size 100 from a beta distribution with parameters (3,10). rng default; % For reproducibility b = betarnd(3,10,[100,1]);

Construct a histogram using 10 bins with a smoothing function fit. histfit(b,10,'kernel')

32-2556

histfit

Specify Axes for Histogram with Distribution Fit Generate a sample of size 100 from a normal distribution with mean 3 and variance 1. rng('default') % For reproducibility r = normrnd(3,1,100,1);

Create a figure with two subplots and return the Axes objects as ax1 and ax2. Create a histogram with a normal distribution fit in each set of axes by referring to the corresponding Axes object. In the left subplot, plot a histogram with 10 bins. In the right subplot, plot a histogram with 5 bins. Add a title to each plot by passing the corresponding Axes object to the title function. ax1 = subplot(1,2,1); % Left subplot histfit(ax1,r,10,'normal') title(ax1,'Left Subplot') ax2 = subplot(1,2,2); % Right subplot histfit(ax2,r,5,'normal') title(ax2,'Right Subplot')

32-2557

32

Functions

Handle for a Histogram with a Distribution Fit Generate a sample of size 100 from a normal distribution with mean 10 and variance 1. rng default % for reproducibility r = normrnd(10,1,100,1);

Construct a histogram with a normal distribution fit. h = histfit(r,10,'normal')

32-2558

histfit

h = 2x1 graphics array: Bar Line

Change the bar colors of the histogram. h(1).FaceColor = [.8 .8 1];

32-2559

32

Functions

Change the color of the density curve. h(2).Color = [.2 .2 .2];

32-2560

histfit

Input Arguments data — Input data vector Input data, specified as a vector. Example: data = [1.5 2.5 4.6 1.2 3.4] Example: data = [1.5 2.5 4.6 1.2 3.4]' Data Types: double | single nbins — Number of bins positive integer | [ ] Number of bins for the histogram, specified as a positive integer. Default value is the square root of the number of elements in data, rounded up. Use [ ] for the default number of bins when fitting a distribution. Example: y = histfit(x,8) Example: y = histfit(x,10,'gamma') Example: y = histfit(x,[ ],'weibull') Data Types: double | single 32-2561

32

Functions

dist — Distribution to fit 'normal' (default) | character vector | string scalar Distribution to fit to the histogram, specified as a character vector or string scalar. The following table shows the supported distributions. dist

Description

'beta'

Beta

'birnbaumsaunders'

Birnbaum-Saunders

'burr'

Burr Type XII

'exponential'

Exponential

'extreme value' or 'ev'

Extreme value

'gamma'

Gamma

'generalized extreme value' or 'gev'

Generalized extreme value

'generalized pareto' or 'gp'

Generalized Pareto (threshold 0)

'inversegaussian'

Inverse Gaussian

'logistic'

Logistic

'loglogistic'

Loglogistic

'lognormal'

Lognormal

'nakagami'

Nakagami

'negative binomial' or 'nbin'

Negative binomial

'normal'

Normal

'poisson'

Poisson

'rayleigh'

Rayleigh

'rician'

Rician

'tlocationscale'

t location-scale

'weibull' or 'wbl'

Weibull

'kernel'

Nonparametric kernel-smoothing distribution. The density is evaluated at 100 equally spaced points that cover the range of the data in data. It works best with continuously distributed samples.

ax — Axes for plot Axes object Axes for the plot, specified as an Axes object. If you do not specify ax, then histfit creates the plot using the current axes. For more information on creating an Axes object, see axes.

Output Arguments h — Handles for the plot plot handle

32-2562

histfit

Handles for the plot, returned as a vector, where h(1) is the handle to the histogram, and h(2) is the handle to the density curve. histfit normalizes the density to match the total area under the curve with that of the histogram.

Algorithms histfit uses fitdist to fit a distribution to data. Use fitdist to obtain parameters used in fitting.

See Also distributionFitter | fitdist | histogram | normfit | paramci Introduced before R2006a

32-2563

32

Functions

hmmdecode Hidden Markov model posterior state probabilities

Syntax PSTATES = hmmdecode(seq,TRANS,EMIS) [PSTATES,logpseq] = hmmdecode(...) [PSTATES,logpseq,FORWARD,BACKWARD,S] = hmmdecode(...) hmmdecode(...,'Symbols',SYMBOLS)

Description PSTATES = hmmdecode(seq,TRANS,EMIS) calculates the posterior state probabilities, PSTATES, of the sequence seq, from a hidden Markov model. The posterior state probabilities are the conditional probabilities of being at state k at step i, given the observed sequence of symbols, sym. You specify the model by a transition probability matrix, TRANS, and an emissions probability matrix, EMIS. TRANS(i,j) is the probability of transition from state i to state j. EMIS(k,seq) is the probability that symbol seq is emitted from state k. PSTATES is an array with the same length as seq and one row for each state in the model. The (i, j)th element of PSTATES gives the probability that the model is in state i at the jth step, given the sequence seq. Note The function hmmdecode begins with the model in state 1 at step 0, prior to the first emission. hmmdecode computes the probabilities in PSTATES based on the fact that the model begins in state 1. [PSTATES,logpseq] = hmmdecode(...) returns logpseq, the logarithm of the probability of sequence seq, given transition matrix TRANS and emission matrix EMIS. [PSTATES,logpseq,FORWARD,BACKWARD,S] = hmmdecode(...) returns the forward and backward probabilities of the sequence scaled by S. hmmdecode(...,'Symbols',SYMBOLS) specifies the symbols that are emitted. SYMBOLS can be a numeric array, a string array, or a cell array of the names of the symbols. The default symbols are integers 1 through N, where N is the number of possible emissions.

Examples trans = [0.95,0.05; 0.10,0.90]; emis = [1/6 1/6 1/6 1/6 1/6 1/6; 1/10 1/10 1/10 1/10 1/10 1/2]; [seq,states] = hmmgenerate(100,trans,emis); pStates = hmmdecode(seq,trans,emis); [seq,states] = hmmgenerate(100,trans,emis,... 'Symbols',{'one','two','three','four','five','six'}) pStates = hmmdecode(seq,trans,emis,... 'Symbols',{'one','two','three','four','five','six'});

32-2564

hmmdecode

References [1] Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge, UK: Cambridge University Press, 1998.

See Also hmmestimate | hmmgenerate | hmmtrain | hmmviterbi Introduced before R2006a

32-2565

32

Functions

hmmestimate Hidden Markov model parameter estimates from emissions and states

Syntax [TRANS,EMIS] = hmmestimate(seq,states) hmmestimate(...,'Symbols',SYMBOLS) hmmestimate(...,'Statenames',STATENAMES) hmmestimate(...,'Pseudoemissions',PSEUDOE) hmmestimate(...,'Pseudotransitions',PSEUDOTR)

Description [TRANS,EMIS] = hmmestimate(seq,states) calculates the maximum likelihood estimate of the transition, TRANS, and emission, EMIS, probabilities of a hidden Markov model for sequence, seq, with known states, states. hmmestimate(...,'Symbols',SYMBOLS) specifies the symbols that are emitted. SYMBOLS can be a numeric array, a string array or a cell array of the names of the symbols. The default symbols are integers 1 through N, where N is the number of possible emissions. hmmestimate(...,'Statenames',STATENAMES) specifies the names of the states. STATENAMES can be a numeric array, a string array, or a cell array of the names of the states. The default state names are 1 through M, where M is the number of states. hmmestimate(...,'Pseudoemissions',PSEUDOE) specifies pseudocount emission values in the matrix PSEUDOE. Use this argument to avoid zero probability estimates for emissions with very low probability that might not be represented in the sample sequence. PSEUDOE should be a matrix of size m-by-n, where m is the number of states in the hidden Markov model and n is the number of possible emissions. If the i k emission does not occur in seq, you can set PSEUDOE(i,k) to be a positive number representing an estimate of the expected number of such emissions in the sequence seq. hmmestimate(...,'Pseudotransitions',PSEUDOTR) specifies pseudocount transition values. You can use this argument to avoid zero probability estimates for transitions with very low probability that might not be represented in the sample sequence. PSEUDOTR should be a matrix of size m-by-m, where m is the number of states in the hidden Markov model. If the i j transition does not occur in states, you can set PSEUDOTR(i,j) to be a positive number representing an estimate of the expected number of such transitions in the sequence states. Pseudotransitions and Pseudoemissions If the probability of a specific transition or emission is very low, the transition might never occur in the sequence states, or the emission might never occur in the sequence seq. In either case, the algorithm returns a probability of 0 for the given transition or emission in TRANS or EMIS. You can compensate for the absence of transition with the 'Pseudotransitions' and 'Pseudoemissions' arguments. The simplest way to do this is to set the corresponding entry of PSEUDOE or PSEUDOTR to 1. For example, if the transition i j does not occur in states, set PSEUDOTR(i,j) = 1. This forces TRANS(i,j) to be positive. If you have an estimate for the expected number of transitions i j in a sequence of the same length as states, and the actual 32-2566

hmmestimate

number of transitions i j that occur in seq is substantially less than what you expect, you can set PSEUDOTR(i,j) to the expected number. This increases the value of TRANS(i,j). For transitions that do occur in states with the frequency you expect, set the corresponding entry of PSEUDOTR to 0, which does not increase the corresponding entry of TRANS. If you do not know the sequence of states, use hmmtrain to estimate the model parameters.

Examples trans = [0.95,0.05; 0.10,0.90]; emis = [1/6 1/6 1/6 1/6 1/6 1/6; 1/10 1/10 1/10 1/10 1/10 1/2]; [seq,states] = hmmgenerate(1000,trans,emis); [estimateTR,estimateE] = hmmestimate(seq,states);

References [1] Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge, UK: Cambridge University Press, 1998.

See Also hmmdecode | hmmgenerate | hmmtrain | hmmviterbi Introduced before R2006a

32-2567

32

Functions

hmmgenerate Hidden Markov model states and emissions

Syntax [seq,states] = hmmgenerate(len,TRANS,EMIS) hmmgenerate(...,'Symbols',SYMBOLS) hmmgenerate(...,'Statenames',STATENAMES)

Description [seq,states] = hmmgenerate(len,TRANS,EMIS) takes a known Markov model, specified by transition probability matrix TRANS and emission probability matrix EMIS, and uses it to generate • A random sequence seq of emission symbols • A random sequence states of states The length of both seq and states is len. TRANS(i,j) is the probability of transition from state i to state j. EMIS(k,l) is the probability that symbol l is emitted from state k. Note The function hmmgenerate begins with the model in state 1 at step 0, prior to the first emission. The model then makes a transition to state i1, with probability T1i1, and generates an emission ak1 with probability Ei1k1 . hmmgenerate returns i1 as the first entry of states, and ak1 as the first entry of seq. 1

hmmgenerate(...,'Symbols',SYMBOLS) specifies the symbols that are emitted. SYMBOLS can be specified as a numeric array, a string array, or a cell array of character vectors. The default symbols are integers 1 through N, where N is the number of possible emissions. hmmgenerate(...,'Statenames',STATENAMES) specifies the names of the states. STATENAMES can be specified as a numeric array, a string array, or a cell array of character vectors. The default state names are 1 through M, where M is the number of states. Since the model always begins at state 1, whose transition probabilities are in the first row of TRANS, in the following example, the first entry of the output states is be 1 with probability 0.95 and 2 with probability 0.05.

Examples trans = [0.95,0.05; 0.10,0.90]; emis = [1/6 1/6 1/6 1/6 1/6 1/6; 1/10 1/10 1/10 1/10 1/10 1/2]; [seq,states] = hmmgenerate(100,trans,emis) [seq,states] = hmmgenerate(100,trans,emis,... 'Symbols',{'one','two','three','four','five','six'},... 'Statenames',{'fair';'loaded'})

32-2568

hmmgenerate

See Also hmmdecode | hmmestimate | hmmtrain | hmmviterbi Introduced before R2006a

32-2569

32

Functions

hmmtrain Hidden Markov model parameter estimates from emissions

Syntax [ESTTR,ESTEMIT] = hmmtrain(seq,TRGUESS,EMITGUESS) hmmtrain(...,'Algorithm',algorithm) hmmtrain(...,'Symbols',SYMBOLS) hmmtrain(...,'Tolerance',tol) hmmtrain(...,'Maxiterations',maxiter) hmmtrain(...,'Verbose',true) hmmtrain(...,'Pseudoemissions',PSEUDOE) hmmtrain(...,'Pseudotransitions',PSEUDOTR)

Description [ESTTR,ESTEMIT] = hmmtrain(seq,TRGUESS,EMITGUESS) estimates the transition and emission probabilities for a hidden Markov model using the Baum-Welch algorithm. seq can be a row vector containing a single sequence, a matrix with one row per sequence, or a cell array with each cell containing a sequence. TRGUESS and EMITGUESS are initial estimates of the transition and emission probability matrices. TRGUESS(i,j) is the estimated probability of transition from state i to state j. EMITGUESS(i,k) is the estimated probability that symbol k is emitted from state i. hmmtrain(...,'Algorithm',algorithm) specifies the training algorithm. algorithm can be either 'BaumWelch' or 'Viterbi'. The default algorithm is 'BaumWelch'. hmmtrain(...,'Symbols',SYMBOLS) specifies the symbols that are emitted. SYMBOLS can be a numeric array, a string array, or a cell array of the names of the symbols. The default symbols are integers 1 through N, where N is the number of possible emissions. hmmtrain(...,'Tolerance',tol) specifies the tolerance used for testing convergence of the iterative estimation process. The default tolerance is 1e-4. hmmtrain(...,'Maxiterations',maxiter) specifies the maximum number of iterations for the estimation process. The default maximum is 100. hmmtrain(...,'Verbose',true) returns the status of the algorithm at each iteration. hmmtrain(...,'Pseudoemissions',PSEUDOE) specifies pseudocount emission values for the Viterbi training algorithm. Use this argument to avoid zero probability estimates for emissions with very low probability that might not be represented in the sample sequence. PSEUDOE should be a matrix of size m-by-n, where m is the number of states in the hidden Markov model and n is the number of possible emissions. If the i→k emission does not occur in seq, you can set PSEUDOE(i,k) to be a positive number representing an estimate of the expected number of such emissions in the sequence seq. hmmtrain(...,'Pseudotransitions',PSEUDOTR) specifies pseudocount transition values for the Viterbi training algorithm. Use this argument to avoid zero probability estimates for transitions with very low probability that might not be represented in the sample sequence. PSEUDOTR should be a matrix of size m-by-m, where m is the number of states in the hidden Markov model. If the i→j 32-2570

hmmtrain

transition does not occur in states, you can set PSEUDOTR(i,j) to be a positive number representing an estimate of the expected number of such transitions in the sequence states. If you know the states corresponding to the sequences, use hmmestimate to estimate the model parameters. Tolerance The input argument 'tolerance' controls how many steps the hmmtrain algorithm executes before the function returns an answer. The algorithm terminates when all of the following three quantities are less than the value that you specify for tolerance: • The log likelihood that the input sequence seq is generated by the currently estimated values of the transition and emission matrices • The change in the norm of the transition matrix, normalized by the size of the matrix • The change in the norm of the emission matrix, normalized by the size of the matrix The default value of 'tolerance' is 1e-6. Increasing the tolerance decreases the number of steps the hmmtrain algorithm executes before it terminates. maxiterations The maximum number of iterations, 'maxiterations', controls the maximum number of steps the algorithm executes before it terminates. If the algorithm executes maxiter iterations before reaching the specified tolerance, the algorithm terminates and the function returns a warning. If this occurs, you can increase the value of 'maxiterations' to make the algorithm reach the desired tolerance before terminating.

Examples trans = [0.95,0.05; 0.10,0.90]; emis = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6; 1/10, 1/10, 1/10, 1/10, 1/10, 1/2]; seq1 = hmmgenerate(100,trans,emis); seq2 = hmmgenerate(200,trans,emis); seqs = {seq1,seq2}; [estTR,estE] = hmmtrain(seqs,trans,emis);

References [1] Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge, UK: Cambridge University Press, 1998.

See Also hmmdecode | hmmestimate | hmmgenerate | hmmviterbi Introduced before R2006a 32-2571

32

Functions

hmmviterbi Hidden Markov model most probable state path

Syntax STATES = hmmviterbi(seq,TRANS,EMIS) hmmviterbi(...,'Symbols',SYMBOLS) hmmviterbi(...,'Statenames',STATENAMES)

Description STATES = hmmviterbi(seq,TRANS,EMIS) given a sequence, seq, calculates the most likely path through the hidden Markov model specified by transition probability matrix, TRANS, and emission probability matrix EMIS. TRANS(i,j) is the probability of transition from state i to state j. EMIS(i,k) is the probability that symbol k is emitted from state i. Note The function hmmviterbi begins with the model in state 1 at step 0, prior to the first emission. hmmviterbi computes the most likely path based on the fact that the model begins in state 1. hmmviterbi(...,'Symbols',SYMBOLS) specifies the symbols that are emitted. SYMBOLS can be a numeric array, a string array, or a cell array of the names of the symbols. The default symbols are integers 1 through N, where N is the number of possible emissions. hmmviterbi(...,'Statenames',STATENAMES) specifies the names of the states. STATENAMES can be a numeric array, a string array, or a cell array of the names of the states. The default state names are 1 through M, where M is the number of states.

Examples trans = [0.95,0.05; 0.10,0.90]; emis = [1/6 1/6 1/6 1/6 1/6 1/6; 1/10 1/10 1/10 1/10 1/10 1/2]; [seq,states] = hmmgenerate(100,trans,emis); estimatedStates = hmmviterbi(seq,trans,emis); [seq,states] = ... hmmgenerate(100,trans,emis,... 'Statenames',{'fair';'loaded'}); estimatesStates = ... hmmviterbi(seq,trans,emis,... 'Statenames',{'fair';'loaded'});

References [1] Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge, UK: Cambridge University Press, 1998. 32-2572

hmmviterbi

See Also hmmdecode | hmmestimate | hmmgenerate | hmmtrain Introduced before R2006a

32-2573

32

Functions

horzcat Class: dataset (Not Recommended) Horizontal concatenation for dataset arrays Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax ds = horzcat(ds1, ds2, ...)

Description ds = horzcat(ds1, ds2, ...) horizontally concatenates the dataset arrays ds1, ds2, ... . You may concatenate dataset arrays that have duplicate variable names, however, the variables must contain identical data, and horzcat includes only one copy of the variable in the output dataset. Observation names for all dataset arrays that have them must be identical except for order. horzcat concatenates by matching observation names when present, or by position for datasets that do not have observation names.

See Also cat | vertcat

32-2574

hougen

hougen Hougen-Watson model

Syntax yhat = hougen(beta,x)

Description yhat = hougen(beta,x) returns the predicted values of the reaction rate, yhat, as a function of the vector of parameters, beta, and the matrix of data, X. beta must have 5 elements and X must have three columns. hougen is a utility function for rsmdemo. The model form is: y =

β1x2 − x3 /β5 1 + β2x1 + β3x2 + β4x3

References [1] Bates, D. M., and D. G. Watts. Nonlinear Regression Analysis and Its Applications. Hoboken, NJ: John Wiley & Sons, Inc., 1988.

See Also rsmdemo Introduced before R2006a

32-2575

32

Functions

hygecdf Hypergeometric cumulative distribution function

Syntax hygecdf(x,M,K,N) hygecdf(x,M,K,N,'upper')

Description hygecdf(x,M,K,N) computes the hypergeometric cdf at each of the values in x using the corresponding size of the population, M, number of items with the desired characteristic in the population, K, and number of samples drawn, N. Vector or matrix inputs for x, M, K, and N must all have the same size. A scalar input is expanded to a constant matrix with the same dimensions as the other inputs. hygecdf(x,M,K,N,'upper') returns the complement of the hypergeometric cdf at each value in x, using an algorithm that more accurately computes the extreme upper tail probabilities. The hypergeometric cdf is

p = F(x M, K, N) =

x



i=0

K M−K i N−i M N

The result, p, is the probability of drawing up to x of a possible K items in N drawings without replacement from a group of M objects.

Examples Compute Hypergeometric Distribution CDF Suppose you have a lot of 100 floppy disks and you know that 20 of them are defective. What is the probability of drawing zero to two defective floppies if you select 10 at random? p = hygecdf(2,100,20,10) p = 0.6812

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. 32-2576

hygecdf

This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also cdf | hygeinv | hygepdf | hygernd | hygestat Introduced before R2006a

32-2577

32

Functions

hygeinv Hypergeometric inverse cumulative distribution function

Syntax hygeinv(P,M,K,N)

Description hygeinv(P,M,K,N) returns the smallest integer X such that the hypergeometric cdf evaluated at X equals or exceeds P. You can think of P as the probability of observing X defective items in N drawings without replacement from a group of M items where K are defective.

Examples Suppose you are the Quality Assurance manager for a floppy disk manufacturer. The production line turns out floppy disks in batches of 1,000. You want to sample 50 disks from each batch to see if they have defects. You want to accept 99% of the batches if there are no more than 10 defective disks in the batch. What is the maximum number of defective disks should you allow in your sample of 50? x = hygeinv(0.99,1000,10,50) x = 3

What is the median number of defective floppy disks in samples of 50 disks from batches with 10 defective disks? x = hygeinv(0.50,1000,10,50) x = 0

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also hygecdf | hygepdf | hygernd | hygestat | icdf Introduced before R2006a 32-2578

hygepdf

hygepdf Hypergeometric probability density function

Syntax Y = hygepdf(X,M,K,N)

Description Y = hygepdf(X,M,K,N) computes the hypergeometric pdf at each of the values in X using the corresponding size of the population, M, number of items with the desired characteristic in the population, K, and number of samples drawn, N. X, M, K, and N can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other inputs. The parameters in M, K, and N must all be positive integers, with N ≤ M. The values in X must be less than or equal to all the parameter values. The hypergeometric pdf is K M−K x N−x y = f (x M, K, N) = M N The result, y, is the probability of drawing exactly x of a possible K items in n drawings without replacement from a group of M objects.

Examples Suppose you have a lot of 100 floppy disks and you know that 20 of them are defective. What is the probability of drawing 0 through 5 defective floppy disks if you select 10 at random? p = hygepdf(0:5,100,20,10) p = 0.0951 0.2679 0.3182 0.2092

0.0841

0.0215

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 32-2579

32

Functions

See Also hygecdf | hygeinv | hygernd | hygestat | pdf Introduced before R2006a

32-2580

hygernd

hygernd Hypergeometric random numbers

Syntax R = hygernd(M,K,N) R = hygernd(M,K,N,m,n,...) R = hygernd(M,K,N,[m,n,...])

Description R = hygernd(M,K,N) generates random numbers from the hypergeometric distribution with corresponding size of the population, M, number of items with the desired characteristic in the population, K, and number of samples drawn, N. M, K, and N can be vectors, matrices, or multidimensional arrays that all have the same size, which is also the size of R. A scalar input for M, K, or N is expanded to a constant array with the same dimensions as the other inputs. R = hygernd(M,K,N,m,n,...) or R = hygernd(M,K,N,[m,n,...]) generates an m-by-n-by-... array. The M, K, N parameters can each be scalars or arrays of the same size as R.

Examples numbers = hygernd(1000,40,50) numbers = 1

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The generated code can return a different sequence of numbers than MATLAB if either of the following is true: • The output is nonscalar. • An input parameter is invalid for the distribution. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox). 32-2581

32

Functions

See Also hygecdf | hygeinv | hygepdf | hygestat | random Introduced before R2006a

32-2582

hygestat

hygestat Hypergeometric mean and variance

Syntax [MN,V] = hygestat(M,K,N)

Description [MN,V] = hygestat(M,K,N) returns the mean of and variance for the hypergeometric distribution with corresponding size of the population, M, number of items with the desired characteristic in the population, K, and number of samples drawn, N. Vector or matrix inputs for M, K, and N must have the same size, which is also the size of MN and V. A scalar input for M, K, or N is expanded to a constant matrix with the same dimensions as the other inputs. The mean of the hypergeometric distribution with parameters M, K, and N is NK/M, and the variance is NK(M-K)(M-N)/[M^2(M-1)].

Examples The hypergeometric distribution approaches the binomial distribution, where p = K/M, as M goes to infinity. [m,v] = hygestat(10.^(1:4),10.^(0:3),9) m = 0.9000 0.9000 0.9000 0.9000 v = 0.0900 0.7445 0.8035 0.8094 [m,v] = binostat(9,0.1) m = 0.9000 v = 0.8100

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also hygecdf | hygeinv | hygepdf | hygernd Introduced before R2006a

32-2583

32

Functions

hyperparameters Variable descriptions for optimizing a fit function

Syntax VariableDescriptions = hyperparameters(FitFcnName,predictors,response) VariableDescriptions = hyperparameters(FitFcnName,predictors,response, LearnerType)

Description VariableDescriptions = hyperparameters(FitFcnName,predictors,response) returns the default variables for the given fit function. These are the variables that apply when you set the OptimizeHyperparameters name-value pair to 'auto'. VariableDescriptions = hyperparameters(FitFcnName,predictors,response, LearnerType) returns the variables for an ensemble fit with specified learner type. This syntax applies when FitFcnName is 'fitcecoc', 'fitcensemble', or 'fitrensemble'.

Examples Obtain Default Hyperparameters Obtain the default hyperparameters for the fitcsvm classifier. Load the ionosphere data. load ionosphere

Obtain the hyperparameters. VariableDescriptions = hyperparameters('fitcsvm',X,Y);

Examine all the hyperparameters. for ii = 1:length(VariableDescriptions) disp(ii),disp(VariableDescriptions(ii)) end 1 optimizableVariable with properties: Name: Range: Type: Transform: Optimize: 2

32-2584

'BoxConstraint' [1.0000e-03 1000] 'real' 'log' 1

hyperparameters

optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'KernelScale' [1.0000e-03 1000] 'real' 'log' 1

3 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'KernelFunction' {'gaussian' 'linear' 'categorical' 'none' 0

'polynomial'}

4 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'PolynomialOrder' [2 4] 'integer' 'none' 0

5 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'Standardize' {'true' 'false'} 'categorical' 'none' 0

Change the PolynomialOrder hyperparameter to have a wider range and to be used in an optimization. VariableDescriptions(4).Range = [2,5]; VariableDescriptions(4).Optimize = true; disp(VariableDescriptions(4)) optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'PolynomialOrder' [2 5] 'integer' 'none' 1

Obtain Ensemble Hyperparameters Obtain the default hyperparameters for the fitrensemble ensemble regression function. 32-2585

32

Functions

Load the carsmall data. load carsmall

Use Horsepower and Weight as predictor variables, and MPG as the response variable. X = [Horsepower Weight]; Y = MPG;

Obtain the default hyperparameters for a Tree learner. VariableDescriptions = hyperparameters('fitrensemble',X,Y,'Tree');

Examine all the hyperparameters. for ii = 1:length(VariableDescriptions) disp(ii),disp(VariableDescriptions(ii)) end 1 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'Method' {'Bag' 'LSBoost'} 'categorical' 'none' 1

2 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'NumLearningCycles' [10 500] 'integer' 'log' 1

3 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'LearnRate' [1.0000e-03 1] 'real' 'log' 1

4 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

32-2586

'MinLeafSize' [1 50] 'integer' 'log' 1

hyperparameters

5 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'MaxNumSplits' [1 99] 'integer' 'log' 0

6 optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'NumVariablesToSample' [1 2] 'integer' 'none' 0

Change the MaxNumSplits hyperparameter to have a wider range and to be used in an optimization. VariableDescriptions(5).Range = [1,200]; VariableDescriptions(5).Optimize = true; disp(VariableDescriptions(5)) optimizableVariable with properties: Name: Range: Type: Transform: Optimize:

'MaxNumSplits' [1 200] 'integer' 'log' 1

Input Arguments FitFcnName — Name of fitting function 'fitcdiscr' | 'fitcecoc' | 'fitcensemble' | 'fitckernel' | 'fitcknn' | 'fitclinear' | 'fitcnb' | 'fitcsvm' | 'fitctree' | 'fitrensemble' | 'fitrgp' | 'fitrkernel' | 'fitrlinear' | 'fitrsvm' | 'fitrtree' Name of fitting function, specified as one of the listed classification or regression fit function names. If FitFcnName is 'fitcecoc', 'fitcensemble', or 'fitrensemble', then also specify the learner type in the LearnerType argument. Example: 'fitctree' predictors — Predictor data matrix with D predictor columns | table with D predictor columns Predictor data, specified as a matrix with D predictor columns or as a table with D predictor columns, where D is the number of predictors. Example: X Data Types: double | logical | char | string | table | cell | categorical | datetime 32-2587

32

Functions

response — Class labels or numeric response grouping variable | scalar Class labels or numeric response, specified as a grouping variable (see “Grouping Variables” on page 2-45) or as a scalar. Example: Y Data Types: single | double | logical | char | string | cell LearnerType — Learner type for ensemble fit 'Discriminant' | 'Kernel' | 'KNN' | 'Linear' | 'SVM' | 'Tree' | template of a listed learner Learner type for ensemble fit, specified as 'Discriminant', 'Kernel', 'KNN', 'Linear', 'SVM', 'Tree', or a template of a listed learner. Use this argument when FitFcnName is 'fitcecoc', 'fitcensemble', or 'fitrensemble'. For 'fitcensemble' you can use only 'Discriminant', 'KNN', 'Tree', or an associated template. For 'fitrensemble', you can use only 'Tree' or templateTree. Example: 'Tree'

Output Arguments VariableDescriptions — Variable descriptions vector of optimizableVariable objects Variable descriptions, returned as a vector of optimizableVariable objects. The variables have their default parameters set, such as range and variable type. All eligible variables exist in the descriptions, but the variables unused in the 'auto' setting have their Optimize property set to false. You can update the variables by using dot notation, as shown in “Examples” on page 32-0

See Also fitcdiscr | fitcecoc | fitcensemble | fitckernel | fitcknn | fitclinear | fitcnb | fitcsvm | fitctree | fitrensemble | fitrgp | fitrkernel | fitrlinear | fitrsvm | fitrtree Introduced in R2016b

32-2588

.

icdf

icdf Package: prob Inverse cumulative distribution function

Syntax x x x x

= = = =

icdf('name',p,A) icdf('name',p,A,B) icdf('name',p,A,B,C) icdf('name',p,A,B,C,D)

x = icdf(pd,p)

Description x = icdf('name',p,A) returns the inverse cumulative distribution function (icdf) for the oneparameter distribution family specified by 'name' and the distribution parameter A, evaluated at the probability values in p. x = icdf('name',p,A,B) returns the icdf for the two-parameter distribution family specified by 'name' and the distribution parameters A and B, evaluated at the probability values in p. x = icdf('name',p,A,B,C) returns the icdf for the three-parameter distribution family specified by 'name' and the distribution parameters A, B, and C, evaluated at the probability values in p. x = icdf('name',p,A,B,C,D) returns the icdf for the four-parameter distribution family specified by 'name' and the distribution parameters A, B, C, and D, evaluated at the probability values in p. x = icdf(pd,p) returns the icdf function of the probability distribution object pd, evaluated at the probability values in p.

Examples Compute the Normal Distribution icdf Create a standard normal distribution object with the mean, μ, equal to 0 and the standard deviation, σ, equal to 1. mu = 0; sigma = 1; pd = makedist('Normal','mu',mu,'sigma',sigma);

Define the input vector p to contain the probability values at which to calculate the icdf. p = [0.1,0.25,0.5,0.75,0.9];

Compute the icdf values for the standard normal distribution at the values in p. x = icdf(pd,p)

32-2589

32

Functions

x = 1×5 -1.2816

-0.6745

0

0.6745

1.2816

Each value in x corresponds to a value in the input vector p. For example, at the value p equal to 0.9, the corresponding icdf value x is equal to 1.2816. Alternatively, you can compute the same icdf values without creating a probability distribution object. Use the icdf function and specify a standard normal distribution using the same parameter values for μ and σ. x2 = icdf('Normal',p,mu,sigma) x2 = 1×5 -1.2816

-0.6745

0

0.6745

1.2816

The icdf values are the same as those computed using the probability distribution object.

Compute the Poisson Distribution icdf Create a Poisson distribution object with the rate parameter, λ, equal to 2. lambda = 2; pd = makedist('Poisson','lambda',lambda);

Define the input vector p to contain the probability values at which to calculate the icdf. p = [0.1,0.25,0.5,0.75,0.9];

Compute the icdf values for the Poisson distribution at the values in p. x = icdf(pd,p) x = 1×5 0

1

2

3

4

Each value in x corresponds to a value in the input vector p. For example, at the value p equal to 0.9, the corresponding icdf value x is equal to 4. Alternatively, you can compute the same icdf values without creating a probability distribution object. Use the icdf function and specify a Poisson distribution using the same value for the rate parameter λ. x2 = icdf('Poisson',p,lambda) x2 = 1×5 0

1

2

3

4

The icdf values are the same as those computed using the probability distribution object. 32-2590

icdf

Compute Standard Normal Critical Values Create a standard normal distribution object. pd = makedist('Normal') pd = NormalDistribution Normal distribution mu = 0 sigma = 1

Determine the critical values at the 5% significance level for a test statistic with a standard normal distribution, by computing the upper and lower 2.5% values. x = icdf(pd,[.025,.975]) x = 1×2 -1.9600

1.9600

Plot the cdf and shade the critical regions. p = normspec(x,0,1,'outside')

32-2591

32

Functions

p = 0.0500

Input Arguments 'name' — Probability distribution name character vector or string scalar of probability distribution name Probability distribution name, specified as one of the probability distribution names in this table. 'name'

Distribution

Input Input Input Input Parameter A Parameter B Parameter C Parameter D

'Beta'

“Beta Distribution” on page B6

a first shape parameter

'Binomial'

“Binomial Distribution” on page B- n number of 10 trials





p probability — of success for each trial



'BirnbaumSaun “Birnbaum-Saunders Distribution” β scale ders' on page B-18 parameter

γ shape parameter





'Burr'

c first shape parameter

k second shape parameter



32-2592

“Burr Type XII Distribution” on page B-19

α scale parameter

b second shape parameter

icdf

'name'

Distribution

Input Input Input Input Parameter A Parameter B Parameter C Parameter D

'Chisquare'

“Chi-Square Distribution” on page ν degrees of B-28 freedom







'Exponential' “Exponential Distribution” on page μ mean B-33







'Extreme Value'

“Extreme Value Distribution” on page B-40

μ location parameter

σ scale parameter





'F'

“F Distribution” on page B-45

ν1 numerator ν2 — degrees of denominator freedom degrees of freedom



'Gamma'

“Gamma Distribution” on page B47

a shape parameter

b scale parameter





'Generalized Extreme Value'

“Generalized Extreme Value Distribution” on page B-55

k shape parameter

σ scale parameter

μ location parameter



'Generalized Pareto'

“Generalized Pareto Distribution” on page B-59

k tail index (shape) parameter

σ scale parameter

μ threshold (location) parameter



'Geometric'

“Geometric Distribution” on page B-63

p probability — parameter





'HalfNormal'

“Half-Normal Distribution” on page B-68

μ location parameter





σ scale parameter

'Hypergeometr “Hypergeometric Distribution” on m size of the k number of n number of ic' page B-73 population items with samples the desired drawn characteristi c in the population



'InverseGauss “Inverse Gaussian Distribution” on μ scale ian' page B-75 parameter

λ shape parameter





'Logistic'

σ scale parameter





'LogLogistic' “Loglogistic Distribution” on page μ mean of B-86 logarithmic values

σ scale — parameter of logarithmic values



'Lognormal'

“Lognormal Distribution” on page μ mean of B-88 logarithmic values

σ standard deviation of logarithmic values





'Nakagami'

“Nakagami Distribution” on page B-108

ω scale parameter





“Logistic Distribution” on page B85

μ mean

μ shape parameter

32-2593

32

Functions

'name'

Distribution

Input Input Input Input Parameter A Parameter B Parameter C Parameter D

'Negative Binomial'

“Negative Binomial Distribution” on page B-109

r number of successes

'Noncentral F'

“Noncentral F Distribution” on page B-115

ν1 numerator ν2 δ — degrees of denominator noncentrality freedom degrees of parameter freedom

'Noncentral t'

“Noncentral t Distribution” on page B-117

ν degrees of freedom

δ — noncentrality parameter



'Noncentral Chi-square'

“Noncentral Chi-Square Distribution” on page B-113

ν degrees of freedom

δ — noncentrality parameter



'Normal'

“Normal Distribution” on page B119

μ mean

σ standard deviation





'Poisson'

“Poisson Distribution” on page B131

λ mean







'Rayleigh'

“Rayleigh Distribution” on page B- b scale 137 parameter







'Rician'

“Rician Distribution” on page B139

s σ scale noncentrality parameter parameter





'Stable'

“Stable Distribution” on page B140

α first shape β second parameter shape parameter

γ scale parameter

δ location parameter

'T'

“Student's t Distribution” on page ν degrees of B-149 freedom







p probability — of success in a single trial



'tLocationSca “t Location-Scale Distribution” on le' page B-156

μ location parameter

σ scale parameter

ν shape parameter



'Uniform'

“Uniform Distribution (Continuous)” on page B-163

a lower endpoint (minimum)

b upper endpoint (maximum)





'Discrete Uniform'

“Uniform Distribution (Discrete)” on page B-168

n maximum observable value







'Weibull'

“Weibull Distribution” on page B170

a scale parameter

b shape parameter





Example: 'Normal' p — Probability values at which to evaluate icdf scalar value | array of scalar values Probability values at which to evaluate the icdf, specified as a scalar value, or an array of scalar values in the range [0,1]. 32-2594

icdf

If one or more of the input arguments p, A, B, C, and D are arrays, then the array sizes must be the same. In this case, icdf expands each scalar input into a constant array of the same size as the array inputs. See 'name' for the definitions of A, B, C, and D for each distribution. Example: [0.1,0.25,0.5,0.75,0.9] Data Types: single | double A — First probability distribution parameter scalar value | array of scalar values First probability distribution parameter, specified as a scalar value or an array of scalar values. If one or more of the input arguments p, A, B, C, and D are arrays, then the array sizes must be the same. In this case, icdf expands each scalar input into a constant array of the same size as the array inputs. See 'name' for the definitions of A, B, C, and D for each distribution. Data Types: single | double B — Second probability distribution parameter scalar value | array of scalar values Second probability distribution parameter, specified as a scalar value or an array of scalar values. If one or more of the input arguments p, A, B, C, and D are arrays, then the array sizes must be the same. In this case, icdf expands each scalar input into a constant array of the same size as the array inputs. See 'name' for the definitions of A, B, C, and D for each distribution. Data Types: single | double C — Third probability distribution parameter scalar value | array of scalar values Third probability distribution parameter, specified as a scalar value or an array of scalar values. If one or more of the input arguments p, A, B, C, and D are arrays, then the array sizes must be the same. In this case, icdf expands each scalar input into a constant array of the same size as the array inputs. See 'name' for the definitions of A, B, C, and D for each distribution. Data Types: single | double D — Fourth probability distribution parameter scalar value | array of scalar values Fourth probability distribution parameter, specified as a scalar value or an array of scalar values. If one or more of the input arguments p, A, B, C, and D are arrays, then the array sizes must be the same. In this case, icdf expands each scalar input into a constant array of the same size as the array inputs. See 'name' for the definitions of A, B, C, and D for each distribution. Data Types: single | double pd — Probability distribution probability distribution object Probability distribution, specified as a probability distribution object created with a function or app in this table. 32-2595

32

Functions

Function or App

Description

makedist

Create a probability distribution object using specified parameter values.

fitdist

Fit a probability distribution object to sample data.

Distribution Fitter

Fit a probability distribution to sample data using the interactive Distribution Fitter app and export the fitted object to the workspace.

paretotails

Create a piecewise distribution object that has generalized Pareto distributions in the tails.

Output Arguments x — icdf values scalar value | array of scalar values icdf values, returned as a scalar value or an array of scalar values. x is the same size as p after any necessary scalar expansion. Each element in x is the icdf value of the distribution, specified by the corresponding elements in the distribution parameters (A, B, C, and D) or specified by the probability distribution object (pd), evaluated at the corresponding element in p.

Alternative Functionality icdf is a generic function that accepts either a distribution by its name 'name' or a probability distribution object pd. It is faster to use a distribution-specific function, such as norminv for the normal distribution and binoinv for the binomial distribution. For a list of distribution-specific functions, see “Supported Distributions” on page 5-13.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The input argument 'name' must be a compile-time constant. For example, to use the normal distribution, include coder.Constant('Normal') in the -args value of codegen. • The input argument pd can be a fitted probability distribution object for beta, exponential, extreme value, lognormal, normal, and Weibull distributions. Create pd by fitting a probability distribution to sample data from the fitdist function. For an example, see “Code Generation for Probability Distribution Objects” on page 31-70. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5.

See Also Distribution Fitter | cdf | fitdist | makedist | mle | paretotails | pdf | random Topics “Working with Probability Distributions” on page 5-2 “Supported Distributions” on page 5-13 32-2596

icdf

Introduced before R2006a

32-2597

32

Functions

inconsistent Inconsistency coefficient

Syntax Y = inconsistent(Z) Y = inconsistent(Z,d)

Description Y = inconsistent(Z) returns the inconsistency coefficient for each link of the hierarchical cluster tree Z generated by the linkage function. inconsistent calculates the inconsistency coefficient for each link by comparing its height with the average height of other links at the same level of the hierarchy. The larger the coefficient, the greater the difference between the objects connected by the link. For more information, see “Algorithms” on page 32-2601. Y = inconsistent(Z,d) returns the inconsistency coefficient for each link in the tree Z by searching to a depth d below each link.

Examples Inconsistency Coefficient Calculation Examine an inconsistency coefficient calculation for a hierarchical cluster tree. Load the examgrades data set. load examgrades

Create a hierarchical cluster tree. Z = linkage(grades);

Create a matrix of inconsistency coefficient information using inconsistent. Examine the information for the 84th link. Y = inconsistent(Z); Y(84,:) ans = 1×4 7.2741

0.3624

3.0000

0.5774

The fourth column of Y contains the inconsistency coefficient, which is computed using the mean in the first column of Y and the standard deviation in the second column of Y. Because the rows of Y correspond to the rows of Z, examine the 84th link in Z. Z(84,:)

32-2598

inconsistent

ans = 1×3 190.0000

203.0000

7.4833

The 84th link connects the 190th and 203rd clusters in the tree and has a height of 7.4833. The 190th cluster corresponds to the link of index 190 − 120 = 70, where 120 is the number of observations. The 203rd cluster corresponds to the 83rd link. By default, inconsistent uses two levels of the tree to compute Y. Therefore, it uses only the 70th, 83rd, and 84th links to compute the inconsistency coefficient for the 84th link. Compare the values in Y(84,:) with the corresponding computations by using the link heights in Z. mean84 = mean([Z(70,3) Z(83,3) Z(84,3)]) mean84 = 7.2741 std84 = std([Z(70,3) Z(83,3) Z(84,3)]) std84 = 0.3624 inconsistent84 = (Z(84,3)-mean84)/std84 inconsistent84 = 0.5774

Compute Inconsistency Coefficient Create the sample data. X = gallery('uniformdata',[10 2],12); Y = pdist(X);

Generate the hierarchical cluster tree. Z = linkage(Y,'single');

Generate a dendrogram plot of the hierarchical cluster tree. dendrogram(Z)

32-2599

32

Functions

Compute the inconsistency coefficient for each link in the cluster tree Z to depth 3. W = inconsistent(Z,3) W = 9×4 0.1313 0.1386 0.1463 0.2391 0.1951 0.2308 0.2395 0.2654 0.3769

0 0 0.0109 0 0.0568 0.0543 0.0748 0.0945 0.0950

1.0000 1.0000 2.0000 1.0000 4.0000 4.0000 4.0000 4.0000 3.0000

0 0 0.7071 0 0.9425 0.9320 0.7636 0.9203 1.1040

Input Arguments Z — Agglomerative hierarchical cluster tree numeric matrix Agglomerative hierarchical cluster tree, specified as a numeric matrix returned by linkage. Z is an (m – 1)-by-3 matrix, where m is the number of observations. Columns 1 and 2 of Z contain cluster 32-2600

inconsistent

indices linked in pairs to form a binary tree. Z(I,3) contains the linkage distances between the two clusters merged in row Z(I,:). Data Types: single | double d — Depth 2 (default) | positive integer scalar Depth, specified as a positive integer scalar. For each link k, inconsistent calculates the corresponding inconsistency coefficient using all the links in the tree within d levels below k. Data Types: single | double

Output Arguments Y — Inconsistency coefficient information numeric matrix Inconsistency coefficient information, returned as an (m – 1)-by-4 matrix, where the (m – 1) rows correspond to the rows of Z. This table describes the columns of Y. Column

Description

1

Mean of the heights of all the links included in the calculation

2

Standard deviation of the heights of all the links included in the calculation

3

Number of links included in the calculation

4

Inconsistency coefficient

Data Types: double

Algorithms For each link k, the inconsistency coefficient is calculated as Y(k, 4) = (Z(k, 3) − Y(k, 1))/Y(k, 2), where Y is the inconsistency coefficient information for links in the hierarchical cluster tree Z. For links that have no further links below them, the inconsistency coefficient is set to 0.

References [1] Jain, A., and R. Dubes. Algorithms for Clustering Data. Upper Saddle River, NJ: Prentice-Hall, 1988. [2] Zahn, C. T. “Graph-theoretical methods for detecting and describing Gestalt clusters.” IEEE Transactions on Computers. Vol. C-20, Issue 1, 1971, pp. 68–86.

See Also cluster | clusterdata | cophenet | dendrogram | linkage | pdist | squareform Topics “Hierarchical Clustering” on page 16-6 32-2601

32

Functions

Introduced before R2006a

32-2602

increaseB

increaseB Class: clustering.evaluation.GapEvaluation Package: clustering.evaluation Increase reference data sets

Syntax eva_out = increaseB(eva,nref)

Description eva_out = increaseB(eva,nref) returns a gap criterion clustering evaluation object eva_out that uses the same evaluation criteria as the input object eva and an additional number of reference data sets as specified by nref.

Input Arguments eva — Clustering evaluation data clustering evaluation object Clustering evaluation data, specified as a clustering evaluation object. Create a clustering evaluation object using evalclusters. nref — Number of additional reference data sets positive integer value Number of additional reference data sets, specified as a positive integer value.

Output Arguments eva_out — Updated clustering evaluation data clustering evaluation object Updated clustering evaluation data, returned as a gap criterion clustering evaluation object. eva_out contains evaluation data obtained using the reference data sets from the input object eva plus a number of additional reference data sets as specified in nref. increaseB updates the B property of the input object eva to reflect the increase in the number of reference data sets used to compute the gap criterion values. increaseB also updates the CriterionValues property with gap criterion values computed using the total number of reference data sets. increaseB might also update the OptimalK and OptimalY properties to reflect the optimal number of clusters and optimal clustering solution as determined using the total number of reference data sets. Additionally, increaseB might also update the LogW, ExpectedLogW, StdLogW, and SE properties.

Examples

32-2603

32

Functions

Evaluate Clustering Solutions Using Additional Reference Data Create a gap clustering evaluation object using evalclusters, then use increaseB to increase the number of reference data sets used to compute the gap criterion values. Load the sample data. load fisheriris

The data contains length and width measurements from the sepals and petals of three species of iris flowers. Cluster the flower measurement data using kmeans, and use the gap criterion to evaluate proposed solutions of one through five clusters. Use 50 reference data sets. rng('default') % For reproducibility eva = evalclusters(meas,'kmeans','gap','klist',1:5,'B',50) eva = GapEvaluation with properties: NumObservations: InspectedK: CriterionValues: OptimalK:

150 [1 2 3 4 5] [0.0870 0.5822 0.8766 1.0007 1.0465] 4

The clustering evaluation object eva contains data on each proposed clustering solution. The returned results indicate that the optimal number of clusters is four. The value of the B property of eva shows 50 reference data sets. eva.B ans = 50

Increase the number of reference data sets by 100, for a total of 150 sets. eva = increaseB(eva,100) eva = GapEvaluation with properties: NumObservations: InspectedK: CriterionValues: OptimalK:

150 [1 2 3 4 5] [0.0794 0.5850 0.8738 1.0034 1.0508] 5

The returned results now indicate that the optimal number of clusters is five. The value of the B property of eva now shows 150 reference data sets. eva.B ans = 150

32-2604

increaseB

See Also evalclusters

32-2605

32

Functions

interactionplot Interaction plot for grouped data

Syntax interactionplot(Y,GROUP) interactionplot(Y,GROUP,'varnames',VARNAMES) [h,AX,bigax] = interactionplot(...)

Description interactionplot(Y,GROUP) displays the two-factor interaction plot for the group means of matrix Y with groups defined by entries in GROUP, which can be a cell array or a matrix. Y is a numeric matrix or vector. If Y is a matrix, the rows represent different observations and the columns represent replications of each observation. If Y is a vector, the rows give the means of each entry in GROUP. If GROUP is a cell array, then each cell of GROUP must contain a grouping variable that is a categorical variable, numeric vector, character matrix, string array, or single-column cell array of character vectors. If GROUP is a matrix, then its columns represent different grouping variables. Each grouping variable must have the same number of rows as Y. The number of grouping variables must be greater than 1. The interaction plot is a matrix plot, with the number of rows and columns both equal to the number of grouping variables. The grouping variable names are printed on the diagonal of the plot matrix. The plot at off-diagonal position (i,j) is the interaction of the two variables whose names are given at row diagonal (i,i) and column diagonal (j,j), respectively. interactionplot(Y,GROUP,'varnames',VARNAMES) displays the interaction plot with userspecified grouping variable names VARNAMES. VARNAMES is a character matrix, a string array, or a cell array of character vectors, one per grouping variable. Default names are 'X1', 'X2', ... . [h,AX,bigax] = interactionplot(...) returns a handle h to the figure window, a matrix AX of handles to the subplot axes, and a handle bigax to the big (invisible) axes framing the subplots.

Examples Display Interaction Plots Randomly generate data for a response variable y . rng default; % For reproducibility y = randn(1000,1);

Randomly generate data for four three-level factors. group = ceil(3*rand(1000,4));

Display the interaction plots for the factors and name the factors 'A', 'B', 'C', 'D'. interactionplot(y,group,'varnames',{'A','B','C','D'})

32-2606

interactionplot

See Also maineffectsplot | multivarichart Introduced in R2006b

32-2607

32

Functions

intersect Class: dataset (Not Recommended) Set intersection for dataset array observations Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax C = intersect(A,B) C = intersect(A,B,vars) C = intersect(A,B,vars,setOrder) [C,iA,iB] = intersect( ___ )

Description C = intersect(A,B) for dataset arrays A and B returns the common set of observations from the two arrays, with repetitions removed. The observations in the dataset array C are in sorted order. C = intersect(A,B,vars) returns the set of common observations from the two arrays, considering only the variables specified in vars, with repetitions removed. The observations in the dataset array C are sorted by those variables. The values for variables not specified in vars for each observation in C are taken from the corresponding observations in A. If there are multiple observations in A that correspond to an observation in C, then those values are taken from the first occurrence. C = intersect(A,B,vars,setOrder) returns the observations in C in the order specified by setOrder. [C,iA,iB] = intersect( ___ ) also returns index vectors iA and iB such that C = A(iA,:) and C = B(iB,:). If there are repeated observations in A or B, then intersect returns the index of the first occurrence. You can use any of the previous input arguments.

Input Arguments A,B Input dataset arrays. vars String array or cell array of character vectors containing variable names, or a vector of integers containing variable column numbers. vars indicates the variables in A and B that intersect considers. Specify vars as [] to use its default value of all variables. 32-2608

intersect

setOrder Flag indicating the sorting order for the observations in C. The possible values of setOrder are: 'sorted'

Observations in C are in sorted order (default).

'stable'

Observations in C are in the same order that they appear in A.

Output Arguments C Dataset array with the common set of observations in A and B, with repetitions removed. C is in sorted order (by default), or the order specified by setOrder. iA Index vector, indicating the observations in A that are common to B. The vector iA contains the index to the first occurrence of any repeated observations in A. iB Index vector, indicating the observations in B that are common to A. The vector iB contains the index to the first occurrence of any repeated observations in B.

Examples Intersection of Two Dataset Arrays Load sample data.

A = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx')); B = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sh

Return the intersection and index vectors. [C,iA,iB] = intersect(A,B); C = id 'TRW-072'

name 'WHITE'

sex 'm'

age 39

wgt 202

smoke 1

age 39

wgt 202

smoke 1

There is one observation in common between A and B. Find the observation in the original dataset arrays. A(iA,:) ans = id 'TRW-072'

name 'WHITE'

sex 'm'

B(iB,:)

32-2609

32

Functions

ans = id 'TRW-072'

name 'WHITE'

sex 'm'

age 39

wgt 202

smoke 1

See Also dataset | ismember | setdiff | setxor | sortrows | union | unique Topics “Merge Dataset Arrays” on page 2-85 “Dataset Arrays” on page 2-112

32-2610

invpred

invpred Inverse prediction

Syntax X0 = invpred(X,Y,Y0) [X0,DXLO,DXUP] = invpred(X,Y,Y0) [X0,DXLO,DXUP] = invpred(X,Y,Y0,name1,val1,name2,val2,...)

Description X0 = invpred(X,Y,Y0) accepts vectors X and Y of the same length, fits a simple regression, and returns the estimated value X0 for which the height of the line is equal to Y0. The output, X0, has the same size as Y0, and Y0 can be an array of any size. [X0,DXLO,DXUP] = invpred(X,Y,Y0) also computes 95% inverse prediction intervals. DXLO and DXUP define intervals with lower bound X0–DXLO and upper bound X0+DXUP. Both DXLO and DXUP have the same size as Y0. The intervals are not simultaneous and are not necessarily finite. Some intervals may extend from a finite value to -Inf or +Inf, and some may extend over the entire real line. [X0,DXLO,DXUP] = invpred(X,Y,Y0,name1,val1,name2,val2,...) specifies optional argument name/value pairs chosen from the following list. Argument names are case insensitive and partial matches are allowed. Name

Value

'alpha'

A value between 0 and 1 specifying a confidence level of 100*(1alpha)%. Default is alpha=0.05 for 95% confidence.

'predopt'

Either 'observation', the default value to compute the intervals for X0 at which a new observation could equal Y0, or 'curve' to compute intervals for the X0 value at which the curve is equal to Y0.

Examples Inverse Prediction Generate sample data. x = 4*rand(25,1); y = 10 + 5*x + randn(size(x));

Make a scatterplot of the data. scatter(x,y)

32-2611

32

Functions

Predict the x value for a given y value of 20. x0 = invpred(x,y,20) x0 = 1.9967

See Also polyconf | polyfit | polytool | polyval Introduced before R2006a

32-2612

iqr

iqr Package: prob Interquartile range

Syntax r r r r

= = = =

iqr(x) iqr(x,'all') iqr(x,dim) iqr(x,vecdim)

r = iqr(pd)

Description r = iqr(x) returns the interquartile range of the values in x. • If x is a vector, then r is the difference between the 75th and the 25th percentiles of the data contained in x. • If x is a matrix, then r is a row vector containing the difference between the 75th and the 25th percentiles of the sample data in each column of x. • If x is a multidimensional array, then iqr operates along the first nonsingleton dimension of x. The size of this dimension becomes 1 while the sizes of all other dimensions remain the same. r = iqr(x,'all') returns the interquartile range of all the values in x. r = iqr(x,dim) returns the interquartile range along the dimension of x specified by dim. r = iqr(x,vecdim) returns the interquartile range over the dimensions specified by vecdim. For example, if x is a matrix, then iqr(x,[1 2]) is the interquartile range of all the elements of x because every element of a matrix is contained in the array slice defined by dimensions 1 and 2. r = iqr(pd) returns the interquartile range of the probability distribution pd.

Examples Compute the Interquartile Range Generate a 4-by-4 matrix of random data from a normal distribution with parameter values μ equal to 10 and σ equal to 1. rng default % For reproducibility x = normrnd(10,1,4) x = 4×4 10.5377 11.8339

10.3188 8.6923

13.5784 12.7694

10.7254 9.9369

32-2613

32

Functions

7.7412 10.8622

9.5664 10.3426

8.6501 13.0349

10.7147 9.7950

Compute the interquartile range for each column of data. r = iqr(x) r = 1×4 2.2086

1.2013

2.5969

0.8541

Compute the interquartile range for each row of data. r2 = iqr(x,2) r2 = 4×1 1.7237 2.9870 1.9449 1.8797

Compute Interquartile Range of Multidimensional Array Compute the interquartile range of a multidimensional array over multiple dimensions by specifying the 'all' and vecdim input arguments. Create a 3-by-4-by-2 array X. X = reshape(1:24,[3 4 2]) X = X(:,:,1) = 1 2 3

4 5 6

7 8 9

10 11 12

16 17 18

19 20 21

22 23 24

X(:,:,2) = 13 14 15

Compute the interquartile range of all the values in X. rall = iqr(X,'all') rall = 12

Compute the interquartile range of each page of X. Specify the first and second dimensions as the operating dimensions along which the interquartile range is calculated. 32-2614

iqr

rpage = iqr(X,[1 2]) rpage = rpage(:,:,1) = 6 rpage(:,:,2) = 6

For example, rpage(1,1,1) is the interquartile range of all the elements in X(:,:,1). Compute the interquartile range of the elements in each X(i,:,:) slice by specifying the second and third dimensions as the operating dimensions. rrow = iqr(X,[2 3]) rrow = 3×1 12 12 12

For example, rrow(3) is the interquartile range of all the elements in X(3,:,:).

Compute the Normal Distribution Interquartile Range Create a standard normal distribution object with the mean, μ, equal to 0 and the standard deviation, σ, equal to 1. pd = makedist('Normal','mu',0,'sigma',1);

Compute the interquartile range of the standard normal distribution. r = iqr(pd) r = 1.3490

The returned value is the difference between the 75th and the 25th percentile values for the distribution. This is equivalent to computing the difference between the inverse cumulative distribution function (icdf) values at the probabilities y equal to 0.75 and 0.25. r2 = icdf(pd,0.75) - icdf(pd,0.25) r2 = 1.3490

Interquartile Range of a Fitted Distribution Load the sample data. Create a vector containing the first column of students’ exam grade data. load examgrades; x = grades(:,1);

32-2615

32

Functions

Create a normal distribution object by fitting it to the data. pd = fitdist(x,'Normal') pd = NormalDistribution Normal distribution mu = 75.0083 [73.4321, 76.5846] sigma = 8.7202 [7.7391, 9.98843]

Compute the interquartile range of the fitted distribution. r = iqr(pd) r = 11.7634

The returned result indicates that the difference between the 75th and 25th percentile of the students’ grades is 11.7634. Use icdf to determine the 75th and 25th percentiles of the students’ grades. y = icdf(pd,[0.25,0.75]) y = 1×2 69.1266

80.8900

Calculate the difference between the 75th and 25th percentiles. This yields the same result as iqr. y(2)-y(1) ans = 11.7634

Use boxplot to visualize the interquartile range. boxplot(x)

32-2616

iqr

The top line of the box shows the 75th percentile, and the bottom line shows the 25th percentile. The center line shows the median, which is the 50th percentile.

Input Arguments x — Input array vector | matrix | multidimensional array Input array, specified as a vector, matrix, or multidimensional array. Data Types: single | double dim — Dimension positive integer value Dimension along which the interquartile range is calculated, specified as a positive integer. For example, for a matrix x, when dim is equal to 1, iqr returns the interquartile range for the columns of x. When dim is equal to 2, iqr returns the interquartile range for the rows of x. For n-dimensional arrays, iqr operates along the first nonsingleton dimension of x. Data Types: single | double vecdim — Vector of dimensions positive integer vector 32-2617

32

Functions

Vector of dimensions, specified as a positive integer vector. Each element of vecdim represents a dimension of the input array x. The output r has length 1 in the specified operating dimensions. The other dimension lengths are the same for x and r. For example, if x is a 2-by-3-by-3 array, then iqr(x,[1 2]) returns a 1-by-1-by-3 array. Each element of the output array is the interquartile range of the elements on the corresponding page of x.

Data Types: single | double pd — Probability distribution probability distribution object Probability distribution, specified as a probability distribution object created using one of the following. Function or App

Description

makedist

Create a probability distribution object using specified parameter values.

fitdist

Fit a probability distribution object to sample data.

Distribution Fitter

Fit a probability distribution to sample data using the interactive Distribution Fitter app and export the fitted object to the workspace.

Output Arguments r — Interquartile range values scalar | vector | matrix | multidimensional array Interquartile range values, returned as a scalar, vector, matrix, or multidimensional array. • If you input an array x, then the dimensions of r depend on whether the 'all', dim, or vecdim input arguments are specified. Each interquartile range value in r is the difference between the 75th and the 25th percentiles of the specified data contained in x. • If you input a probability distribution pd, then the scalar value of r is the difference between the values of the 75th and 25th percentiles of the probability distribution.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: 32-2618

iqr

• These input arguments are not supported: 'all' and vecdim. • The dim input argument must be a compile-time constant. • If you do not specify the dim input argument, the working (or operating) dimension can be different in the generated code. As a result, run-time errors can occur. For more details, see “Automatic dimension restriction” (MATLAB Coder). • The input argument pd can be a fitted probability distribution object for beta, exponential, extreme value, lognormal, normal, and Weibull distributions. Create pd by fitting a probability distribution to sample data from the fitdist function. For an example, see “Code Generation for Probability Distribution Objects” on page 31-70. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • These input arguments are not supported: 'all', vecdim, and pd. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also Distribution Fitter | boxplot | fitdist | icdf | mad | makedist | range | std Topics “Working with Probability Distributions” on page 5-2 “Supported Distributions” on page 5-13 Introduced before R2006a

32-2619

32

Functions

isempty Class: dataset (Not Recommended) True for empty dataset array Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax tf = isempty(A)

Description tf = isempty(A) returns true (1) if A is an empty dataset and false (0) otherwise. An empty array has no elements, that is prod(size(A))==0.

See Also size

32-2620

islevel

islevel (Not Recommended) Determine if levels are in nominal or ordinal array Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Syntax tf = islevel(levels,A)

Description tf = islevel(levels,A) returns a logical array indicating which of the levels in levels correspond to a level in the nominal or ordinal array A.

Input Arguments A — Nominal or ordinal array nominal array | ordinal array Nominal or ordinal array, specified as a nominal or ordinal array object created with nominal or ordinal. levels — Levels to test character vector | string array | cell array of character vectors | 2-D character matrix Levels to test, specified as a character vector, string array, cell array of character vectors, or 2-D character matrix. Data Types: char | string | cell

Output Arguments tf — Level indicator logical array Level indicator, returned as a logical array of the same size as levels. tf has the value 1 (true) when the corresponding element of levels is the label of a level in the nominal or ordinal array A, even if the level contains no elements. Otherwise, tf has the value 0 (false).

See Also isequal | ismember | nominal | ordinal Introduced in R2007a

32-2621

32

Functions

ismember Class: dataset (Not Recommended) Dataset array elements that are members of set Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax LiA = ismember(A,B) LiA = ismember(A,B,vars) [LiA,LocB] = ismember( ___ )

Description LiA = ismember(A,B) for dataset arrays A and B returns a vector of logical values the same length as A. The output vector, LiA, has value 1 (true) in the elements that correspond to observations in A that are also present in B, and 0 (false) otherwise. LiA = ismember(A,B,vars) returns a vector of logical values the same length as A. The output vector, LiA, has value 1 (true) in the elements that correspond to observations in A that are also present in B for the variables specified in vars only, and 0 (false) otherwise. [LiA,LocB] = ismember( ___ ) also returns a vector the same length as A containing the index to the first observation in B that corresponds to each observation in A, or 0 if there is no such observation. You can use any of the previous input arguments.

Input Arguments A Query dataset array, containing the observations to be found in B. B Set dataset array. When an observation in A is found in B, for all variables or only those variables specified in vars, the corresponding element of LiA is 1. vars String array or cell array of character vectors containing variable names, or a vector of integers containing variable column numbers. vars indicates which variables to match observations on in A and B.

32-2622

ismember

Output Arguments LiA Vector of logical values the same length as A. LiA has value 1 (true) when the corresponding observation in A is present in B. Otherwise, LiA has value 0 (false). If you specify vars, LiA has value 1 when the corresponding observation in A is present in B for the variables in vars only. LocB Vector the same length as A containing the index to the first observation in B that corresponds to each observation in A, for all variables or only those variables specified in vars.

Examples Find Observations That Are Members of a Dataset Array Load sample data. load('hospital') B = hospital(1:50,1:5);

This set dataset array, B, has 50 observations on 5 variables. Specify a query dataset array. rng('default') rIx = randsample(100,10); A = hospital(rIx,1:5) A = YLN-495 LQW-768 DGC-290 DAU-529 REV-997 QEQ-082 AGR-528 PUE-347 HVR-372 XUE-826

LastName {'COLEMAN' } {'TAYLOR' } {'BUTLER' } {'REED' } {'ALEXANDER'} {'COX' } {'SIMMONS' } {'YOUNG' } {'RUSSELL' } {'JACKSON' }

Sex Male Female Male Male Male Female Male Female Male Male

Age 39 31 38 50 25 28 45 25 44 25

Weight 188 132 184 186 171 111 181 114 188 174

Smoker false false true true true false false false true false

Check which observations in A are present in B. LiA = ismember(A,B) LiA = 10x1 logical array 0 1 0 0

32-2623

32

Functions

0 0 0 1 0 1

Display the observations in A that are present in B. A(LiA,:) ans = LQW-768 PUE-347 XUE-826

LastName {'TAYLOR' } {'YOUNG' } {'JACKSON'}

Sex Female Female Male

Age 31 25 25

Weight 132 114 174

Smoker false false false

Weight 132 114 174

Smoker false false false

Find the location of the observations in B. [~,LocB] = ismember(A,B) LocB = 10×1 0 10 0 0 0 0 0 28 0 13

Display the observations in B that match observations in A. B(LocB(LocB>0),:) ans = LQW-768 PUE-347 XUE-826

LastName {'TAYLOR' } {'YOUNG' } {'JACKSON'}

Sex Female Female Male

Age 31 25 25

See Also dataset | intersect | setdiff | setxor | sortrows | union | unique Topics “Dataset Arrays” on page 2-112

32-2624

ismissing

ismissing Class: dataset (Not Recommended) Find dataset array elements with missing values Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax I = ismissing(ds) I = ismissing(ds,Name,Value)

Description I = ismissing(ds) returns a logical array that indicates which elements in the dataset array, ds, contain a missing value. By default, ismissing recognizes NaN as a missing value in numeric variables, '' as a missing value in character variables, and as a missing value in categorical arrays. • ds2 = ds(~any(I,2),:) creates a new dataset array containing only the complete observations in ds. • ds2 = ds(:,~any(I,1)) creates a new dataset array containing only the variables from ds with no missing values. I = ismissing(ds,Name,Value) returns missing value indices with additional options specified by one or more Name,Value pair arguments.

Input Arguments ds dataset array Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. NumericTreatAsMissing Vector of numeric values to treat as missing value indicators in floating point ds variables. ismissing always treats a NaN value as a missing value. Default: 32-2625

32

Functions

StringTreatAsMissing Character vector, string array, or cell array of character vectors to treat as missing value indicators in character ds variables. ismissing always treats '' as a missing value.

Output Arguments I Logical array indicating which elements in ds contain a missing value. I is the same size as ds, with value 1 for elements that contain a missing value.

See Also dataset | isempty | isnan | isundefined | replaceWithMissing Topics “Clean Messy and Missing Data” on page 2-97 “Dataset Arrays” on page 2-112

32-2626

isvalid

isvalid Class: qrandstream Test handle validity

Syntax tf = isvalid(h)

Description tf = isvalid(h) performs an element-wise check for validity on the handle elements of h. The result is a logical array of the same dimensions as h, where each element is the element-wise validity result. A handle is invalid if it has been deleted or if it is an element of a handle array and has not yet been initialized.

See Also delete | qrandstream

32-2627

32

Functions

iwishrnd Inverse Wishart random numbers

Syntax W = iwishrnd(Tau,df) W = iwishrnd(Tau,df,DI) [W,DI] = iwishrnd(Tau,df)

Description W = iwishrnd(Tau,df) generates a random matrix W from the inverse Wishart distribution with parameters Tau and df. The inverse of W has the Wishart distribution with covariance matrix Sigma = inv(Tau) and with df degrees of freedom. Tau is a symmetric and positive definite matrix. W = iwishrnd(Tau,df,DI) expects DI to be the transpose of the inverse of the Cholesky factor of Tau, so that DI'*DI = inv(Tau), where inv is the MATLAB inverse function. DI is lowertriangular and the same size as Tau. If you call iwishrnd multiple times using the same value of Tau, it is more efficient to supply DI instead of computing it each time. [W,DI] = iwishrnd(Tau,df) returns DI so you can use it as an input in future calls to iwishrnd. Note that different sources use different parametrizations for the inverse Wishart distribution. This function defines the parameter tau so that the mean of the output matrix is Tau/(df-d-1) where d is the dimension of Tau.

See Also wishrnd Topics “Inverse Wishart Distribution” on page B-76 Introduced before R2006a

32-2628

jackknife

jackknife Jackknife sampling

Syntax jackstat = jackknife(jackfun,X) jackstat = jackknife(jackfun,X,Y,...) jackstat = jackknife(jackfun,...,'Options',option)

Description jackstat = jackknife(jackfun,X) draws jackknife data samples from the n-by-p data array X, computes statistics on each sample using the function jackfun, and returns the results in the matrix jackstat. jackknife regards each row of X as one data sample, so there are n data samples. Each of the n rows of jackstat contains the results of applying jackfun to one jackknife sample. jackfun is a function handle specified with @. Row i of jackstat contains the results for the sample consisting of X with the ith row omitted: s = x; s(i,:) = []; jackstat(i,:) = jackfun(s);

If jackfun returns a matrix or array, then this output is converted to a row vector for storage in jackstat. If X is a row vector, it is converted to a column vector. jackstat = jackknife(jackfun,X,Y,...) accepts additional arguments to be supplied as inputs to jackfun. They may be scalars, column vectors, or matrices. jackknife creates each jackknife sample by sampling with replacement from the rows of the non-scalar data arguments (these must have the same number of rows). Scalar data are passed to jackfun unchanged. Nonscalar arguments must have the same number of rows, and each jackknife sample omits the same row from these arguments. jackstat = jackknife(jackfun,...,'Options',option) provides an option to perform jackknife iterations in parallel, if the Parallel Computing Toolbox is available. Set 'Options' as a structure you create with statset. jackknife uses the following field in the structure: 'UseParallel'

If true, use multiple processors to compute jackknife iterations. If the Parallel Computing Toolbox is not installed, then computation occurs in serial mode. Default is false, meaning serial computation.

Examples Estimate the bias of the MLE variance estimator of random samples taken from the vector y using jackknife. The bias has a known formula in this problem, so you can compare the jackknife value to this formula. sigma = 5; y = normrnd(0,sigma,100,1); m = jackknife(@var,y,1); n = length(y);

32-2629

32

Functions

bias = -sigma^2/n % known bias formula jbias = (n-1)*(mean(m)-var(y,1)) % jackknife bias estimate bias = -0.2500 jbias = -0.3378

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also bootstrp | histogram | ksdensity | random | randsample Topics “Jackknife Resampling” on page 3-16 Introduced in R2006a

32-2630

jbtest

jbtest Jarque-Bera test

Syntax h = jbtest(x) h = jbtest(x,alpha) h = jbtest(x,alpha,mctol) [h,p] = jbtest( ___ ) [h,p,jbstat,critval] = jbtest( ___ )

Description h = jbtest(x) returns a test decision for the null hypothesis that the data in vector x comes from a normal distribution with an unknown mean and variance, using the Jarque-Bera test on page 32-2634. The alternative hypothesis is that it does not come from such a distribution. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, and 0 otherwise. h = jbtest(x,alpha) returns a test decision for the null hypothesis at the significance level specified by alpha. h = jbtest(x,alpha,mctol) returns a test decision based on a p-value computed using a Monte Carlo simulation with a maximum Monte Carlo standard error on page 32-2634 less than or equal to mctol. [h,p] = jbtest( ___ ) also returns the p-value p of the hypothesis test, using any of the input arguments from the previous syntaxes. [h,p,jbstat,critval] = jbtest( ___ ) also returns the test statistic jbstat and the critical value critval for the test.

Examples Test for a Normal Distribution Load the data set. load carbig

Test the null hypothesis that car mileage, in miles per gallon (MPG), follows a normal distribution across different makes of cars. h = jbtest(MPG) h = 1

The returned value of h = 1 indicates that jbtest rejects the null hypothesis at the default 5% significance level.

32-2631

32

Functions

Test the Hypothesis at a Different Significance Level Load the data set. load carbig

Test the null hypothesis that car mileage in miles per gallon (MPG) follows a normal distribution across different makes of cars at the 1% significance level. [h,p] = jbtest(MPG,0.01) h = 1 p = 0.0022

The returned value of h = 1, and the returned p-value less than α = 0.01 indicate that jbtest rejects the null hypothesis.

Test for a Normal Distribution Using Monte Carlo Simulation Load the data set. load carbig

Test the null hypothesis that car mileage, in miles per gallon (MPG), follows a normal distribution across different makes of cars. Use a Monte Carlo simulation to obtain an exact p-value. [h,p,jbstat,critval] = jbtest(MPG,[],0.0001) h = 1 p = 0.0022 jbstat = 18.2275 critval = 5.8461

The returned value of h = 1 indicates that jbtest rejects the null hypothesis at the default 5% significance level. Additionally, the test statistic, jbstat, is larger than the critical value, critval, which indicates rejection of the null hypothesis.

Input Arguments x — Sample data vector Sample data for the hypothesis test, specified as a vector. jbtest treats NaN values in x as missing values and ignores them. Data Types: single | double alpha — Significance level 0.05 (default) | scalar value in the range (0,1) 32-2632

jbtest

Significance level of the hypothesis test, specified as a scalar value in the range (0,1). If alpha is in the range [0.001,0.50], and if the sample size is less than or equal to 2000, jbtest looks up the critical value for the test in a table of precomputed values. To conduct the test at a significance level outside of these specifications, use mctol. Example: 0.01 Data Types: single | double mctol — Maximum Monte Carlo standard error nonnegative scalar value Maximum Monte Carlo standard error on page 32-2634 for the p-value, p, specified as a nonnegative scalar value. If you specify a value for mctol, jbtest computes a Monte Carlo approximation for p directly, rather than interpolating into a table of precomputed values. jbtest chooses the number of Monte Carlo replications large enough to make the Monte Carlo standard error for p less than mctol. If you specify a value for mctol, you must also specify a value for alpha. You can specify alpha as [] to use the default value of 0.05. Example: 0.0001 Data Types: single | double

Output Arguments h — Hypothesis test result 1|0 Hypothesis test result, returned as 1 or 0. • If h = 1, this indicates the rejection of the null hypothesis at the alpha significance level. • If h = 0, this indicates a failure to reject the null hypothesis at the alpha significance level. p — p-value scalar value in the range (0,1) p-value of the test, returned as a scalar value in the range (0,1). p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of p cast doubt on the validity of the null hypothesis. jbtest warns when p is not found within the tabulated range of [0.001,0.50], and returns either the smallest or largest tabulated value. In this case, you can use mctol to compute a more accurate pvalue. jbstat — Test statistic nonnegative scalar value Test statistic for the Jarque-Bera test, returned as a nonnegative scalar value. critval — Critical value nonnegative scalar value Critical value for the Jarque-Bera test at the alpha significance level, returned as a nonnegative scalar value. If alpha is in the range [0.001,0.50], and if the sample size is less than or equal to 2000, 32-2633

32

Functions

jbtest looks up the critical value for the test in a table of precomputed values. If you use mctol, jbtest determines the critical value of the test using a Monte Carlo simulation. The null hypothesis is rejected when jbstat > critval.

More About Jarque-Bera Test The Jarque-Bera test is a two-sided goodness-of-fit test suitable when a fully specified null distribution is unknown and its parameters must be estimated. The test is specifically designed for alternatives in the Pearson system of distributions. The test statistic is JB =

n 2 k−3 s + 6 4

2

,

where n is the sample size, s is the sample skewness, and k is the sample kurtosis. For large sample sizes, the test statistic has a chi-square distribution with two degrees of freedom. Monte Carlo Standard Error The Monte Carlo standard error is the error due to simulating the p-value. The Monte Carlo standard error is calculated as SE =

p 1−p , mcreps

where p is the estimated p-value of the hypothesis test, and mcreps is the number of Monte Carlo replications performed. jbtest chooses the number of Monte Carlo replications, mcreps, large enough to make the Monte Carlo standard error for p less than the value specified for mctol.

Algorithms Jarque-Bera tests often use the chi-square distribution to estimate critical values for large samples, deferring to the Lilliefors test (see lillietest) for small samples. jbtest, by contrast, uses a table of critical values computed using Monte Carlo simulation for sample sizes less than 2000 and significance levels from 0.001 to 0.50. Critical values for a test are computed by interpolating into the table, using the analytic chi-square approximation only when extrapolating for larger sample sizes.

References [1] Jarque, C. M., and A. K. Bera. “A Test for Normality of Observations and Regression Residuals.” International Statistical Review. Vol. 55, No. 2, 1987, pp. 163–172. [2] Deb, P., and M. Sefton. “The Distribution of a Lagrange Multiplier Test of Normality.” Economics Letters. Vol. 51, 1996, pp. 123–130. This paper proposed a Monte Carlo simulation for determining the distribution of the test statistic. The results of this function are based on an independent Monte Carlo simulation, not the results in this paper. 32-2634

jbtest

See Also adtest | kstest | lillietest Introduced before R2006a

32-2635

32

Functions

johnsrnd Johnson system random numbers

Syntax r = johnsrnd(quantiles,m,n) r = johnsrnd(quantiles) [r,type] = johnsrnd(...) [r,type,coefs] = johnsrnd(...)

Description r = johnsrnd(quantiles,m,n) returns an m-by-n matrix of random numbers drawn from the distribution in the Johnson system that satisfies the quantile specification given by quantiles. quantiles is a four-element vector of quantiles for the desired distribution that correspond to the standard normal quantiles [–1.5 –0.5 0.5 1.5]. In other words, you specify a distribution from which to draw random values by designating quantiles that correspond to the cumulative probabilities [0.067 0.309 0.691 0.933]. quantiles may also be a 2-by-4 matrix whose first row contains four standard normal quantiles, and whose second row contains the corresponding quantiles of the desired distribution. The standard normal quantiles must be spaced evenly. Note Because r is a random sample, its sample quantiles typically differ somewhat from the specified distribution quantiles. r = johnsrnd(quantiles) returns a scalar value. r = johnsrnd(quantiles,m,n,...) or r = johnsrnd(quantiles,[m,n,...]) returns an mby-n-by-... array. [r,type] = johnsrnd(...) returns the type of the specified distribution within the Johnson system. type is 'SN', 'SL', 'SB', or 'SU'. Set m and n to zero to identify the distribution type without generating any random values. The four distribution types in the Johnson system correspond to the following transformations of a normal random variate: • 'SN' — Identity transformation (normal distribution on page B-119) • 'SL' — Exponential transformation (lognormal distribution on page B-88) • 'SB' — Logistic transformation (bounded) • 'SU' — Hyperbolic sine transformation (unbounded) [r,type,coefs] = johnsrnd(...) returns coefficients coefs of the transformation that defines the distribution. coefs is [gamma, eta, epsilon, lambda]. If z is a standard normal random variable and h is one of the transformations defined above, r = lambda*h((z-gamma)/eta) +epsilon is a random variate from the distribution type corresponding to h.

32-2636

johnsrnd

Examples Generate Random Samples Using the Johnson System This example shows several different approaches to using the Johnson system of flexible distribution families to generate random numbers and fit a distribution to sample data. Generate random values with longer tails than a standard normal. rng default; % For reproducibility r = johnsrnd([-1.7 -.5 .5 1.7],1000,1); figure; qqplot(r);

Generate random values skewed to the right. r = johnsrnd([-1.3 -.5 .5 1.7],1000,1); figure; qqplot(r);

32-2637

32

Functions

Generate random values that match some sample data well in the right-hand tail. load carbig; qnorm = [.5 1 1.5 2]; q = quantile(Acceleration, normcdf(qnorm)); r = johnsrnd([qnorm;q],1000,1); [q;quantile(r,normcdf(qnorm))] ans = 2×4 16.7000 16.6986

18.2086 18.2220

19.5376 19.9078

21.7263 22.0918

Determine the distribution type and the coefficients. [r,type,coefs] = johnsrnd([qnorm;q],0) r = [] type = 'SU' coefs = 1×4 1.0920

32-2638

0.5829

18.4382

1.4494

johnsrnd

See Also pearsrnd | random Topics “Generating Data Using Flexible Families of Distributions” on page 7-21 Introduced in R2006a

32-2639

32

Functions

join Class: dataset (Not Recommended) Merge observations Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax C = join(A,B) C = join(A,B,keys) C = join(A,B,param1,val1,param2,val2,...) [C,IB] = join(...) C = join(A,B,'Type',TYPE,...) C = join(A,B,'Type',TYPE,'MergeKeys',true,...) [C,IA,IB] = join(A,B,'Type',TYPE,...)

Description C = join(A,B) creates a dataset array C by merging observations from the two dataset arrays A and B. join performs the merge by first finding key variables, that is, pairs of dataset variables, one in A and one in B, that share the same name. Each observation in B must contain a unique combination of values in the key variables, and must contain all combinations of values that are present in the keys from A. join then uses these key variables to define a many-to-one correspondence between observations in A and those in B. join uses this correspondence to replicate the observations in B and combine them with the observations in A to create C. C = join(A,B,keys) performs the merge using the variables specified by keys as the key variables in both A and B. keys is a positive integer, a vector of positive integers, a character vector, a string array, a cell array of character vectors, or a logical vector. C contains one observation for each observation in A. Variables in C include all of the variables from A, as well as one variable corresponding to each variable in B (except for the keys from B). If A and B contain variables with identical names, join adds the suffix '_left' and '_right' to the corresponding variables in C. C = join(A,B,param1,val1,param2,val2,...) specifies optional parameter name/value pairs to control how the dataset variables in A and B are used in the merge. Parameters are: • 'Keys' — Specifies the variables to use as keys in both A and B. • 'LeftKeys' — Specifies the variables to use as keys in A. • 'RightKeys' — Specifies the variables to use as keys in B. You may provide either the 'Keys' parameter, or both the 'LeftKeys' and 'RightKeys' parameters. The value for these parameters is a positive integer, a vector of positive integers, a character vector, a string array, a cell array of character vectors, or a logical vector. 'LeftKeys' or 'RightKeys' must both specify the same number of key variables, and join pairs the left and right keys in the order specified. 32-2640

join

• 'LeftVars' — Specifies which variables from A to include in C. By default, join includes all variables from A. • 'RightVars' — Specifies which variables from B to include in C. By default, join includes all variables from B except the key variables. You can use 'LeftVars' or 'RightVars' to include or exclude key variables as well as data variables. The value for these parameters is a positive integer, a vector of positive integers, a character vector, a string array, a cell array of character vectors, or a logical vector. [C,IB] = join(...) returns an index vector IB, where join constructs C by horizontally concatenating A(:,LeftVars) and B(IB,RightVars). join can also perform more complicated inner and outer join operations that allow a many-to-many correspondence between A and B, and allow unmatched observations in either A or B. C = join(A,B,'Type',TYPE,...) performs the join operation specified by TYPE. TYPE is one of 'inner', 'leftouter', 'rightouter', 'fullouter', or 'outer' (which is a synonym for 'fullouter'). For an inner join, C only contains observations corresponding to a combination of key values that occurred in both A and B. For a left (or right) outer join, C also contains observations corresponding to keys in A (or B) that did not match any in B (or A). Variables in C taken from A (or B) contain null values in those observations. A full outer join is equivalent to a left and right outer join. C contains variables corresponding to the key variables from both A and B, and join sorts the observations in C by the key values. For inner and outer joins, C contains variables corresponding to the key variables from both A and B by default, as well as all the remaining variables. join sorts the observations in the result C by the key values. C = join(A,B,'Type',TYPE,'MergeKeys',true,...) includes a single variable in C for each key variable pair from A and B, rather than including two separate variables. For outer joins, join creates the single variable by merging the key values from A and B, taking values from A where a corresponding observation exists in A, and from B otherwise. Setting the 'MergeKeys' parameter to true overrides inclusion or exclusion of any key variables specified via the 'LeftVars' or 'RightVars' parameter. Setting the 'MergeKeys' parameter to false is equivalent to not passing in the 'MergeKeys' parameter. [C,IA,IB] = join(A,B,'Type',TYPE,...) returns index vectors IA and IB indicating the correspondence between observations in C and those in A and B. For an inner join, join constructs C by horizontally concatenating A(IA,LeftVars) and B(IB,RightVars). For an outer join, IA or IB may also contain zeros, indicating the observations in C that do not correspond to observations in A or B, respectively.

Examples Create a dataset array from Fisher's iris data: load fisheriris NumObs = size(meas,1); NameObs = strcat({'Obs'},num2str((1:NumObs)','%-d')); iris = dataset({nominal(species),'species'},... {meas,'SL','SW','PL','PW'},... 'ObsNames',NameObs);

Create a separate dataset array with the diploid chromosome counts for each species of iris: 32-2641

32

Functions

snames = nominal({'setosa';'versicolor';'virginica'}); CC = dataset({snames,'species'},{[38;108;70],'cc'}) CC = species cc setosa 38 versicolor 108 virginica 70

Broadcast the data in CC to the rows of iris using the key variable species in each dataset: iris2 = join(iris,CC); iris2([1 2 51 52 101 102],:) ans = species SL Obs1 setosa 5.1 Obs2 setosa 4.9 Obs51 versicolor 7 Obs52 versicolor 6.4 Obs101 virginica 6.3 Obs102 virginica 5.8

SW 3.5 3 3.2 3.2 3.3 2.7

PL 1.4 1.4 4.7 4.5 6 5.1

PW 0.2 0.2 1.4 1.5 2.5 1.9

cc 38 38 108 108 70 70

Create two datasets and join them using the 'MergeKeys' flag: % % % % a

Create two data sets that both contain the key variable 'Key1'. The two arrays contain observations with common values of Key1, but each array also contains observations with values of Key1 not present in the other. = dataset({'a' 'b' 'c' 'e' 'h'}',[1 2 3 11 17]',... 'VarNames',{'Key1' 'Var1'}) b = dataset({'a' 'b' 'd' 'e'}',[4 5 6 7]',... 'VarNames',{'Key1' 'Var2'}) % Combine a and b with an outer join, which matches up % observations with common key values, but also retains % observations whose key values don't have a match. % Keep the key values as separate variables in the result. couter = join(a,b,'key','Key1','Type','outer') % Join a and b, merging the key values as a single variable % in the result. coutermerge = join(a,b,'key','Key1','Type','outer',... 'MergeKeys',true) % Join a and b, retaining only observations whose key % values match. cinner = join(a,b,'key','Key1','Type','inner',... 'MergeKeys',true) a = Key1 'a' 'b' 'c' 'e' 'h'

32-2642

Var1 1 2 3 11 17

join

b = Key1 'a' 'b' 'd' 'e'

Var2 4 5 6 7

couter = Key1_left 'a' 'b' 'c' '' 'e' 'h'

Var1 1 2 3 NaN 11 17

Key1_right 'a' 'b' '' 'd' 'e' ''

Var2 4 5 NaN 6 7 NaN

coutermerge = Key1 'a' 'b' 'c' 'd' 'e' 'h'

Var1 1 2 3 NaN 11 17

Var2 4 5 NaN 6 7 NaN

Var1 1 2 11

Var2 4 5 7

cinner = Key1 'a' 'b' 'e'

See Also sortrows

32-2643

32

Functions

KDTreeSearcher Create Kd-tree nearest neighbor searcher

Description KDTreeSearcher model objects store the results of a nearest neighbor search that uses the Kd-tree algorithm. Results include the training data, distance metric and its parameters, and maximum number of data points in each leaf node (that is, the bucket size). The Kd-tree algorithm partitions an n-by-K data set by recursively splitting n points in K-dimensional space into a binary tree. Once you create a KDTreeSearcher model object, you can search the stored tree to find all neighboring points to the query data by performing a nearest neighbor search using knnsearch or a radius search using rangesearch. The Kd-tree algorithm is more efficient than the exhaustive search algorithm when K is small (that is, K ≤ 10), the training and query sets are not sparse, and the training and query sets have many observations.

Creation Use either the createns function or the KDTreeSearcher function (described here) to create a KDTreeSearcher model object. Both functions use the same syntax except that the createns function has the 'NSMethod' name-value pair argument, which you use to choose the nearest neighbor search method. The createns function also creates an ExhaustiveSearcher object. Specify 'NSMethod','kdtree' to create a KDTreeSearcher object. The default is 'kdtree' if K ≤ 10, the training data is not sparse, and the distance metric is Euclidean, city block, Chebychev, or Minkowski.

Syntax Mdl = KDTreeSearcher(X) Mdl = KDTreeSearcher(X,Name,Value) Description Mdl = KDTreeSearcher(X) grows a default Kd-tree (Mdl) using the n-by-K numeric matrix of training data (X). Mdl = KDTreeSearcher(X,Name,Value) specifies additional options using one or more namevalue pair arguments. You can specify the maximum number of data points in each leaf node (that is, the bucket size) and the distance metric, and set the distance metric parameter (DistParameter) property. For example, KDTreeSearcher(X,'Distance','minkowski','BucketSize',10) specifies to use the Minkowski distance when searching for nearest neighbors and to use 10 for the bucket size. To specify DistParameter, use the P name-value pair argument. Input Arguments X — Training data numeric matrix 32-2644

KDTreeSearcher

Training data that grows the Kd-tree, specified as a numeric matrix. X has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature). Data Types: single | double Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Distance','minkowski','P',3,'BucketSize',10 specifies to use the following when searching for nearest neighbors: the Minkowski distance, 3 for the Minkowski distance metric exponent, and 10 for the bucket size. Distance — Distance metric 'euclidean' (default) | 'chebychev' | 'cityblock' | 'minkowski' Distance metric used when you call knnsearch or rangesearch to find nearest neighbors for future query points, specified as the comma-separated pair consisting of 'Distance' and one of these values. Value

Description

'chebychev'

Chebychev distance (maximum coordinate difference).

'cityblock'

City block distance.

'euclidean'

Euclidean distance.

'minkowski'

Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'P' name-value pair argument.

For more details, see “Distance Metrics” on page 18-11. The software does not use the distance metric for creating a KDTreeSearcher model object, so you can alter the distance metric by using dot notation after creating the object. Example: 'Distance','minkowski' P — Exponent for Minkowski distance metric 2 (default) | positive scalar Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar. This argument is valid only if 'Distance' is 'minkowski'. Example: 'P',3 Data Types: single | double BucketSize — Maximum number of data points in each leaf node 50 (default) | positive integer Maximum number of data points in each leaf node of the Kd-tree, specified as the comma-separated pair consisting of 'BucketSize' and a positive integer. Example: 'BucketSize',10 Data Types: single | double 32-2645

32

Functions

Properties X — Training data numeric matrix This property is read-only. Training data that grows the Kd-tree, specified as a numeric matrix. X has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature). The input argument X of createns or KDTreeSearcher sets this property. Data Types: single | double Distance — Distance metric 'chebychev' | 'cityblock' | 'minkowski' Distance metric used when you call knnsearch or rangesearch to find nearest neighbors for future query points, specified as 'chebychev', 'cityblock', 'euclidean', or 'minkowski'. The 'Distance' name-value pair argument of createns or KDTreeSearcher sets this property. The software does not use the distance metric for creating a KDTreeSearcher model object, so you can alter it by using dot notation. DistParameter — Distance metric parameter values [] | positive scalar Distance metric parameter values, specified as empty ([]) or a positive scalar. If Distance is 'minkowski', then DistParameter is the exponent in the Minkowski distance formula. Otherwise, DistParameter is [], indicating that the specified distance metric formula has no parameters. The 'P' name-value pair argument of createns or KDTreeSearcher sets this property. You can alter DistParameter by using dot notation, for example, Mdl.DistParameter = PNew, where PNew is a positive scalar. Data Types: single | double BucketSize — Maximum number of data points in each leaf node positive integer This property is read-only. Maximum number of data points in each leaf node of the Kd-tree, specified as a positive integer. The 'BucketSize' name-value pair argument of createns or KDTreeSearcher sets this property. Data Types: single | double

Object Functions knnsearch 32-2646

Find k-nearest neighbors using searcher object

KDTreeSearcher

rangesearch

Find all neighbors within specified distance using searcher object

Examples Grow Default Kd-Tree Grow a four-dimensional Kd-tree that uses the Euclidean distance. Load Fisher's iris data set. load fisheriris X = meas; [n,k] = size(X) n = 150 k = 4

X has 150 observations and 4 predictors. Grow a four-dimensional Kd-tree using the entire data set as training data. Mdl1 = KDTreeSearcher(X) Mdl1 = KDTreeSearcher with properties: BucketSize: Distance: DistParameter: X:

50 'euclidean' [] [150x4 double]

Mdl1 is a KDTreeSearcher model object, and its properties appear in the Command Window. The object contains information about the grown four-dimensional Kd-tree, such as the distance metric. You can alter property values using dot notation. Alternatively, you can grow a Kd-tree by using createns. Mdl2 = createns(X) Mdl2 = KDTreeSearcher with properties: BucketSize: Distance: DistParameter: X:

50 'euclidean' [] [150x4 double]

Mdl2 is also a KDTreeSearcher model object, and it is equivalent to Mdl1. Because X has four columns and the default distance metric is Euclidean, createns creates a KDTreeSearcher model by default. To find the nearest neighbors in X to a batch of query data, pass the KDTreeSearcher model object and the query data to knnsearch or rangesearch. 32-2647

32

Functions

Specify the Minkowski Distance for Nearest Neighbor Search Load Fisher's iris data. Focus on the petal dimensions. load fisheriris X = meas(:,[3 4]); % Predictors

Grow a two-dimensional Kd-tree using createns and the training data. Specify the Minkowski distance metric. Mdl = createns(X,'Distance','Minkowski') Mdl = KDTreeSearcher with properties: BucketSize: Distance: DistParameter: X:

50 'minkowski' 2 [150x2 double]

Because X has two columns and the distance metric is Minkowski, createns creates a KDTreeSearcher model object by default. Access properties of Mdl by using dot notation. For example, use Mdl.DistParameter to access the Minkowski distance exponent. Mdl.DistParameter ans = 2

You can pass query data and Mdl to: • knnsearch to find indices and distances of nearest neighbors • rangesearch to find indices of all nearest neighbors within a distance that you specify

Alter Properties of KDTreeSearcher Model Create a KDTreeSearcher model object and alter the Distance property by using dot notation. Load Fisher's iris data set. load fisheriris X = meas;

Grow a default four-dimensional Kd-tree using the entire data set as training data. Mdl = KDTreeSearcher(X) Mdl = KDTreeSearcher with properties:

32-2648

KDTreeSearcher

BucketSize: Distance: DistParameter: X:

50 'euclidean' [] [150x4 double]

Specify that the neighbor searcher use the Minkowski metric to compute the distances between the training and query data. Mdl.Distance = 'minkowski' Mdl = KDTreeSearcher with properties: BucketSize: Distance: DistParameter: X:

50 'minkowski' 2 [150x4 double]

You can pass Mdl and the query data to either knnsearch or rangesearch to find the nearest neighbors to the points in the query data based on the Minkowski distance.

Search for Nearest Neighbors of Query Data Using Minkowski Distance Grow a Kd-tree nearest neighbor searcher object by using the createns function. Pass the object and query data to the knnsearch function to find k-nearest neighbors. Load Fisher's iris data set. load fisheriris

Remove five irises randomly from the predictor data to use as a query set. rng(1); n = size(meas,1); qIdx = randsample(n,5); tIdx = ~ismember(1:n,qIdx); Q = meas(qIdx,:); X = meas(tIdx,:);

% % % %

For reproducibility Sample size Indices of query data Indices of training data

Grow a four-dimensional Kd-tree using the training data. Specify the Minkowski distance for finding nearest neighbors. Mdl = createns(X,'Distance','minkowski') Mdl = KDTreeSearcher with properties: BucketSize: Distance: DistParameter: X:

50 'minkowski' 2 [145x4 double]

32-2649

32

Functions

Because X has four columns and the distance metric is Minkowski, createns creates a KDTreeSearcher model object by default. The Minkowski distance exponent is 2 by default. Find the indices of the training data (Mdl.X) that are the two nearest neighbors of each point in the query data (Q). IdxNN = knnsearch(Mdl,Q,'K',2) IdxNN = 5×2 17 6 1 89 124

4 2 12 66 100

Each row of IdxNN corresponds to a query data observation, and the column order corresponds to the order of the nearest neighbors, with respect to ascending distance. For example, based on the Minkowski distance, the second nearest neighbor of Q(3,:) is X(12,:).

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: The knnsearch and rangesearch functions support code generation. For more information, see “Introduction to Code Generation” on page 31-2 and “Code Generation for Nearest Neighbor Searcher” on page 31-13.

See Also ExhaustiveSearcher | createns Topics “k-Nearest Neighbor Search and Radius Search” on page 18-13 “Distance Metrics” on page 18-11 Introduced in R2010a

32-2650

kfoldEdge

kfoldEdge Package: classreg.learning.partition Classification edge for cross-validated ECOC model

Syntax edge = kfoldEdge(CVMdl) edge = kfoldEdge(CVMdl,Name,Value)

Description edge = kfoldEdge(CVMdl) returns the classification edge on page 32-2656 obtained by the crossvalidated ECOC model (ClassificationPartitionedECOC) CVMdl. For every fold, kfoldEdge computes the classification edge for validation-fold observations using an ECOC model trained on training-fold observations. CVMdl.X contains both sets of observations. edge = kfoldEdge(CVMdl,Name,Value) returns the classification edge with additional options specified by one or more name-value pair arguments. For example, specify the number of folds, decoding scheme, or verbosity level.

Examples Estimate k-Fold Cross-Validation Edge Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); rng(1); % For reproducibility

Train and cross-validate an ECOC model using support vector machine (SVM) binary classifiers. Standardize the predictor data using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(X,Y,'CrossVal','on','Learners',t,'ClassNames',classOrder);

CVMdl is a ClassificationPartitionedECOC model. By default, the software implements 10-fold cross-validation. You can specify a different number of folds using the 'KFold' name-value pair argument. Estimate the average of the edges. edge = kfoldEdge(CVMdl) edge = 0.4825

32-2651

32

Functions

Alternatively, you can obtain the per-fold edges by specifying the name-value pair 'Mode','individual' in kfoldEdge.

Display Individual Edges for Each Cross-Validation Fold The classification edge is a relative measure of classifier quality. To determine which folds perform poorly, display the edges for each fold. Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); rng(1); % For reproducibility

Train an ECOC model using SVM binary classifiers. Use 8-fold cross-validation, standardize the predictors using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(X,Y,'KFold',8,'Learners',t,'ClassNames',classOrder);

Estimate the classification edge for each fold. edges = kfoldEdge(CVMdl,'Mode','individual') edges = 8×1 0.4791 0.4872 0.4260 0.5302 0.5064 0.4575 0.4860 0.4687

The edges have similar magnitudes across folds. Folds that perform poorly have small edges relative to the other folds. To return the average classification edge across the folds that perform well, specify the 'Folds' name-value pair argument.

Select ECOC Model Features by Comparing Cross-Validation Edges The classifier edge measures the average of the classifier margins. One way to perform feature selection is to compare cross-validation edges from multiple models. Based solely on this criterion, the classifier with the greatest edge is the best classifier. 32-2652

kfoldEdge

Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); % Class order rng(1); % For reproducibility

Define the following two data sets. • fullX contains all the predictors. • partX contains the petal dimensions. fullX = X; partX = X(:,3:4);

For each predictor set, train and cross-validate an ECOC model using SVM binary classifiers. Standardize the predictors using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(fullX,Y,'CrossVal','on','Learners',t,... 'ClassNames',classOrder); PCVMdl = fitcecoc(partX,Y,'CrossVal','on','Learners',t,... 'ClassNames',classOrder);

CVMdl and PCVMdl are ClassificationPartitionedECOC models. By default, the software implements 10-fold cross-validation. Estimate the edge for each classifier. fullEdge = kfoldEdge(CVMdl) fullEdge = 0.4825 partEdge = kfoldEdge(PCVMdl) partEdge = 0.4951

The two models have comparable edges.

Input Arguments CVMdl — Cross-validated ECOC model ClassificationPartitionedECOC model Cross-validated ECOC model, specified as a ClassificationPartitionedECOC model. You can create a ClassificationPartitionedECOC model in two ways: • Pass a trained ECOC model (ClassificationECOC) to crossval. • Train an ECOC model using fitcecoc and specify any one of these cross-validation name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'.

32-2653

32

Functions

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldEdge(CVMdl,'BinaryLoss','hinge') specifies 'hinge' as the binary learner loss function. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in loss function name or function handle. • This table describes the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses so that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, for example customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction has this form: bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. For an example of passing a custom binary loss function, see “Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function” on page 32-4169. 32-2654

kfoldEdge

The default BinaryLoss value depends on the score ranges returned by the binary learners. This table describes some default BinaryLoss values based on the given assumptions. Assumption

Default Value

All binary learners are SVMs or either linear or kernel classification models of SVM learners.

'hinge'

All binary learners are ensembles trained by AdaboostM1 or GentleBoost.

'exponential'

All binary learners are ensembles trained by LogitBoost.

'binodeviance'

All binary learners are linear or kernel classification models of logistic regression learners. Or, you specify to predict class posterior probabilities by setting 'FitPosterior',true in fitcecoc.

'quadratic'

To check the default value, use dot notation to display the BinaryLoss property of the trained model at the command line. Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2824. Example: 'Decoding','lossbased' Folds — Fold indices for prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices for prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must be within the range from 1 to CVMdl.KFold. The software uses only the folds specified in Folds for prediction. Example: 'Folds',[1 4 10] Data Types: single | double Mode — Aggregation level for output 'average' (default) | 'individual' Aggregation level for the output, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. This table describes the values. Value

Description

'average'

The output is a scalar average over all folds.

32-2655

32

Functions

Value

Description

'individual'

The output is a vector of length k containing one value per fold, where k is the number of folds.

Example: 'Mode','individual' Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments edge — Classification edge numeric scalar | numeric column vector Classification edge on page 32-2656, returned as a numeric scalar or numeric column vector. If Mode is 'average', then edge is the average classification edge over all folds. Otherwise, edge is a k-by-1 numeric column vector containing the classification edge for each fold, where k is the number of folds.

More About Classification Edge The classification edge is the weighted mean of the classification margins. One way to choose among multiple classifiers, for example to perform feature selection, is to choose the classifier that yields the greatest edge. Classification Margin The classification margin is, for each observation, the difference between the negative loss for the true class and the maximal negative loss among the false classes. If the margins are on the same 32-2656

kfoldEdge

scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. 32-2657

32

Functions

Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationECOC | ClassificationPartitionedECOC | edge | fitcecoc | kfoldMargin | kfoldPredict | statset Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2014b

32-2658

kfoldEdge

kfoldEdge Classification edge for observations not used for training

Syntax E = kfoldEdge(obj) E = kfoldEdge(obj,Name,Value)

Description E = kfoldEdge(obj) returns classification edge (average classification margin) obtained by crossvalidated classification ensemble obj. For every fold, this method computes classification edge for infold observations using an ensemble trained on out-of-fold observations. E = kfoldEdge(obj,Name,Value) calculates edge with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments ens Object of class ClassificationPartitionedEnsemble. Create ens with fitcensemble along with one of the cross-validation options: 'crossval', 'kfold', 'holdout', 'leaveout', or 'cvpartition'. Alternatively, create ens from a classification ensemble with crossval. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. folds Indices of folds ranging from 1 to ens.KFold. Use only these folds for predictions. Default: 1:ens.KFold mode Character vector or string scalar representing the meaning of the output edge: • 'average' — edge is a scalar value, the average over all folds. • 'individual' — edge is a vector of length ens.KFold with one element per fold. • 'cumulative' — edge is a vector of length min(ens.NTrainedPerFold) in which element J is obtained by averaging values across all folds for weak learners 1:J in each fold. Default: 'average' 32-2659

32

Functions

Output Arguments E The average classification margin. E is a scalar or vector, depending on the setting of the mode namevalue pair.

Examples Compute K-Fold Edge of Held-Out Observations Compute the k-fold edge for an ensemble trained on the Fisher iris data. Load the sample data set. load fisheriris

Train an ensemble of 100 boosted classification trees. t = templateTree('MaxNumSplits',1); % Weak learner template tree object ens = fitcensemble(meas,species,'Learners',t);

Create a cross-validated ensemble from ens and find the classification edge. rng(10,'twister') % For reproducibility cvens = crossval(ens); E = kfoldEdge(cvens) E = 3.2052

More About Edge The edge is the weighted mean value of the classification margin. The weights are the class probabilities in obj.Prior. Margin The classification margin is the difference between the classification score for the true class and maximal classification score for the false classes. Margin is a column vector with the same number of rows as in the matrix obj.X. Score (ensemble) For ensembles, a classification score represents the confidence of a classification into a class. The higher the score, the higher the confidence. Different ensemble algorithms have different definitions for their scores. Furthermore, the range of scores depends on ensemble type. For example: • AdaBoostM1 scores range from –∞ to ∞. • Bag scores range from 0 to 1. 32-2660

kfoldEdge

See Also crossval | kfoldLoss | kfoldMargin | kfoldPredict | kfoldfun

32-2661

32

Functions

kfoldEdge Package: classreg.learning.partition Classification edge for cross-validated kernel classification model

Syntax edge = kfoldEdge(CVMdl) edge = kfoldEdge(CVMdl,Name,Value)

Description edge = kfoldEdge(CVMdl) returns the classification edge on page 32-2665 obtained by the crossvalidated, binary kernel model (ClassificationPartitionedKernel) CVMdl. For every fold, kfoldEdge computes the classification edge for validation-fold observations using a model trained on training-fold observations. edge = kfoldEdge(CVMdl,Name,Value) returns the classification edge with additional options specified by one or more name-value pair arguments. For example, specify the number of folds or the aggregation level.

Examples Estimate k-Fold Cross-Validation Edge Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g'). load ionosphere

Cross-validate a binary kernel classification model using the data. CVMdl = fitckernel(X,Y,'Crossval','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernel CrossValidatedModel: 'Kernel' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedKernel model. By default, the software implements 10fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. 32-2662

kfoldEdge

Estimate the cross-validated classification edge. edge = kfoldEdge(CVMdl) edge = 1.5585

Alternatively, you can obtain the per-fold edges by specifying the name-value pair 'Mode','individual' in kfoldEdge.

Feature Selection Using k-Fold Edges Perform feature selection by comparing k-fold edges from multiple models. Based solely on this criterion, the classifier with the greatest edge is the best classifier. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g'). load ionosphere

Randomly choose half of the predictor variables. rng(1); % For reproducibility p = size(X,2); % Number of predictors idxPart = randsample(p,ceil(0.5*p));

Cross-validate two binary kernel classification models: one that uses all of the predictors, and one that uses half of the predictors. CVMdl = fitckernel(X,Y,'CrossVal','on'); PCVMdl = fitckernel(X(:,idxPart),Y,'CrossVal','on');

CVMdl and PCVMdl are ClassificationPartitionedKernel models. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' namevalue pair argument instead of 'Crossval'. Estimate the k-fold edge for each classifier. fullEdge = kfoldEdge(CVMdl) fullEdge = 1.5142 partEdge = kfoldEdge(PCVMdl) partEdge = 1.8910

Based on the k-fold edges, the classifier that uses half of the predictors is the better model.

Input Arguments CVMdl — Cross-validated, binary kernel classification model ClassificationPartitionedKernel model object Cross-validated, binary kernel classification model, specified as a ClassificationPartitionedKernel model object. You can create a 32-2663

32

Functions

ClassificationPartitionedKernel model by using fitckernel and specifying any one of the cross-validation name-value pair arguments. To obtain estimates, kfoldEdge applies the same data used to cross-validate the kernel classification model (X and Y). Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldEdge(CVMdl,'Mode','individual') returns the classification edge for each fold. Folds — Fold indices for prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices for prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must be within the range from 1 to CVMdl.KFold. The software uses only the folds specified in Folds for prediction. Example: 'Folds',[1 4 10] Data Types: single | double Mode — Aggregation level for output 'average' (default) | 'individual' Aggregation level for the output, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. This table describes the values. Value

Description

'average'

The output is a scalar average over all folds.

'individual'

The output is a vector of length k containing one value per fold, where k is the number of folds.

Example: 'Mode','individual'

Output Arguments edge — Classification edge numeric scalar | numeric column vector Classification edge on page 32-2665, returned as a numeric scalar or numeric column vector. If Mode is 'average', then edge is the average classification edge over all folds. Otherwise, edge is a k-by-1 numeric column vector containing the classification edge for each fold, where k is the number of folds.

32-2664

kfoldEdge

More About Classification Edge The classification edge is the weighted mean of the classification margins. One way to choose among multiple classifiers, for example to perform feature selection, is to choose the classifier that yields the greatest edge. Classification Margin The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class. The software defines the classification margin for binary classification as m = 2yf x . x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x). If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Classification Score For kernel classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by f x = T(x)β + b . • T · is a transformation of an observation for feature expansion. • β is the estimated column vector of coefficients. • b is the estimated scalar bias. The raw classification score for classifying x into the negative class is −f(x). The software classifies observations into the class that yields a positive score. If the kernel classification model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores (see ScoreTransform).

See Also ClassificationPartitionedKernel | fitckernel Introduced in R2018b

32-2665

32

Functions

kfoldEdge Package: classreg.learning.partition Classification edge for cross-validated kernel ECOC model

Syntax edge = kfoldEdge(CVMdl) edge = kfoldEdge(CVMdl,Name,Value)

Description edge = kfoldEdge(CVMdl) returns the classification edge on page 32-2670 obtained by the crossvalidated kernel ECOC model (ClassificationPartitionedKernelECOC) CVMdl. For every fold, kfoldEdge computes the classification edge for validation-fold observations using a model trained on training-fold observations. edge = kfoldEdge(CVMdl,Name,Value) returns the classification edge with additional options specified by one or more name-value pair arguments. For example, specify the number of folds, decoding scheme, or verbosity level.

Examples Estimate k-Fold Cross-Validation Edge Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Cross-validate an ECOC model composed of kernel binary learners. CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' Properties, Methods

32-2666

kfoldEdge

CVMdl is a ClassificationPartitionedKernelECOC model. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Estimate the cross-validated classification edges. edge = kfoldEdge(CVMdl) edge = 0.4145

Alternatively, you can obtain the per-fold edges by specifying the name-value pair 'Mode','individual' in kfoldEdge.

Feature Selection Using k-Fold Edges Perform feature selection by comparing k-fold edges from multiple models. Based solely on this criterion, the classifier with the greatest edge is the best classifier. Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Randomly choose half of the predictor variables. rng(1); % For reproducibility p = size(X,2); % Number of predictors idxPart = randsample(p,ceil(0.5*p));

Cross-validate two ECOC models composed of kernel classification models: one that uses all of the predictors, and one that uses half of the predictors. CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on'); PCVMdl = fitcecoc(X(:,idxPart),Y,'Learners','kernel','CrossVal','on');

CVMdl and PCVMdl are ClassificationPartitionedKernelECOC models. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Estimate the k-fold edge for each classifier. fullEdge = kfoldEdge(CVMdl) fullEdge = 0.4092 partEdge = kfoldEdge(PCVMdl) partEdge = 0.4161

Based on the k-fold edges, the two classifiers are comparable.

32-2667

32

Functions

Input Arguments CVMdl — Cross-validated kernel ECOC model ClassificationPartitionedKernelECOC model Cross-validated kernel ECOC model, specified as a ClassificationPartitionedKernelECOC model. You can create a ClassificationPartitionedKernelECOC model by training an ECOC model using fitcecoc and specifying these name-value pair arguments: • 'Learners'– Set the value to 'kernel', a template object returned by templateKernel, or a cell array of such template objects. • One of the arguments 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldEdge(CVMdl,'BinaryLoss','hinge') specifies 'hinge' as the binary learner loss function. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in loss function name or function handle. • This table contains names and descriptions of the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, for example, customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction has this form: bLoss = customFunction(M,s)

32-2668

kfoldEdge

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. By default, if all binary learners are kernel classification models using SVM, then BinaryLoss is 'hinge'. If all binary learners are kernel classification models using logistic regression, then BinaryLoss is 'quadratic'. Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2670. Example: 'Decoding','lossbased' Folds — Fold indices for prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices for prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must be within the range from 1 to CVMdl.KFold. The software uses only the folds specified in Folds for prediction. Example: 'Folds',[1 4 10] Data Types: single | double Mode — Aggregation level for output 'average' (default) | 'individual' Aggregation level for the output, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. This table describes the values. Value

Description

'average'

The output is a scalar average over all folds.

'individual'

The output is a vector of length k containing one value per fold, where k is the number of folds.

Example: 'Mode','individual' 32-2669

32

Functions

Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments edge — Classification edge numeric scalar | numeric column vector Classification edge on page 32-2670, returned as a numeric scalar or numeric column vector. If Mode is 'average', then edge is the average classification edge over all folds. Otherwise, edge is a k-by-1 numeric column vector containing the classification edge for each fold, where k is the number of folds.

More About Classification Edge The classification edge is the weighted mean of the classification margins. One way to choose among multiple classifiers, for example to perform feature selection, is to choose the classifier that yields the greatest edge. Classification Margin The classification margin is, for each observation, the difference between the negative loss for the true class and the maximal negative loss among the false classes. If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. 32-2670

kfoldEdge

Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

32-2671

32

Functions

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

See Also ClassificationPartitionedKernelECOC | fitcecoc Introduced in R2018b

32-2672

kfoldEdge

kfoldEdge Classification edge for observations not used for training

Syntax e = kfoldEdge(CVMdl) e = kfoldEdge(CVMdl,Name,Value)

Description e = kfoldEdge(CVMdl) returns the cross-validated classification edges on page 32-2678 obtained by the cross-validated, binary, linear classification model CVMdl. That is, for every fold, kfoldEdge estimates the classification edge for observations that it holds out when it trains using all other observations. e contains a classification edge for each regularization strength in the linear classification models that comprise CVMdl. e = kfoldEdge(CVMdl,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, indicate which folds to use for the edge calculation.

Input Arguments CVMdl — Cross-validated, binary, linear classification model ClassificationPartitionedLinear model object Cross-validated, binary, linear classification model, specified as a ClassificationPartitionedLinear model object. You can create a ClassificationPartitionedLinear model using fitclinear and specifying any one of the cross-validation, name-value pair arguments, for example, CrossVal. To obtain estimates, kfoldEdge applies the same data used to cross-validate the linear classification model (X and Y). Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Folds — Fold indices to use for classification-score prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices to use for classification-score prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must range from 1 through CVMdl.KFold. Example: 'Folds',[1 4 10] Data Types: single | double 32-2673

32

Functions

Mode — Edge aggregation level 'average' (default) | 'individual' Edge aggregation level, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. Value

Description

'average'

Returns classification edges averaged over all folds

'individual'

Returns classification edges for each fold

Example: 'Mode','individual'

Output Arguments e — Cross-validated classification edges numeric scalar | numeric vector | numeric matrix Cross-validated classification edges on page 32-2678, returned as a numeric scalar, vector, or matrix. Let L be the number of regularization strengths in the cross-validated models (that is, L is numel(CVMdl.Trained{1}.Lambda)) and F be the number of folds (stored in CVMdl.KFold). • If Mode is 'average', then e is a 1-by-L vector. e(j) is the average classification edge over all folds of the cross-validated model that uses regularization strength j. • Otherwise, e is an F-by-L matrix. e(i,j) is the classification edge for fold i of the cross-validated model that uses regularization strength j. To estimate e, kfoldEdge uses the data that created CVMdl (see X and Y).

Examples Estimate k-Fold Cross-Validation Edge Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Cross-validate a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'CrossVal','on');

32-2674

kfoldEdge

CVMdl is a ClassificationPartitionedLinear model. By default, the software implements 10fold cross validation. You can alter the number of folds using the 'KFold' name-value pair argument. Estimate the average of the out-of-fold edges. e = kfoldEdge(CVMdl) e = 8.1243

Alternatively, you can obtain the per-fold edges by specifying the name-value pair 'Mode','individual' in kfoldEdge.

Feature Selection Using k-fold Edges One way to perform feature selection is to compare k-fold edges from multiple models. Based solely on this criterion, the classifier with the highest edge is the best classifier. Load the NLP data set. Preprocess the data as in “Estimate k-Fold Cross-Validation Edge” on page 322674. load nlpdata Ystats = Y == 'stats'; X = X';

Create these two data sets: • fullX contains all predictors. • partX contains 1/2 of the predictors chosen at random. rng(1); % For reproducibility p = size(X,1); % Number of predictors halfPredIdx = randsample(p,ceil(0.5*p)); fullX = X; partX = X(halfPredIdx,:);

Cross-validate two binary, linear classification models: one that uses the all of the predictors and one that uses half of the predictors. Optimize the objective function using SpaRSA, and indicate that observations correspond to columns. CVMdl = fitclinear(fullX,Ystats,'CrossVal','on','Solver','sparsa',... 'ObservationsIn','columns'); PCVMdl = fitclinear(partX,Ystats,'CrossVal','on','Solver','sparsa',... 'ObservationsIn','columns');

CVMdl and PCVMdl are ClassificationPartitionedLinear models. Estimate the k-fold edge for each classifier. fullEdge = kfoldEdge(CVMdl) fullEdge = 16.5629 partEdge = kfoldEdge(PCVMdl) partEdge = 13.9030

32-2675

32

Functions

Based on the k-fold edges, the classifier that uses all of the predictors is the better model.

Find Good Lasso Penalty Using k-fold Edge To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare k-fold edges. Load the NLP data set. Preprocess the data as in “Estimate k-Fold Cross-Validation Edge” on page 322674. load nlpdata Ystats = Y == 'stats'; X = X'; −8

Create a set of 11 logarithmically-spaced regularization strengths from 10

1

through 10 .

Lambda = logspace(-8,1,11);

Cross-validate a binary, linear classification model using 5-fold cross-validation and that uses each of the regularization strengths. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8. rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns','KFold',5,... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8) CVMdl = classreg.learning.partition.ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1x1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedLinear model. Because fitclinear implements 5-fold cross-validation, CVMdl contains 5 ClassificationLinear models that the software trains on each fold. Estimate the edges for each fold and regularization strength. eFolds = kfoldEdge(CVMdl,'Mode','individual') eFolds = 5×11 0.9958 0.9991 0.9992 0.9974

32-2676

0.9958 0.9991 0.9992 0.9974

0.9958 0.9991 0.9992 0.9974

0.9958 0.9991 0.9992 0.9974

0.9958 0.9991 0.9992 0.9974

0.9925 0.9939 0.9942 0.9931

0.9766 0.9780 0.9782 0.9773

0.9131 0.9141 0.9149 0.9171

0.8468 0.8267 0.8253 0.8442

0.8 0.8 0.8 0.8

kfoldEdge

0.9977

0.9977

0.9977

0.9977

0.9977

0.9942

0.9782

0.9110

0.8384

0.8

0.8363

0.8

eFolds is a 5-by-11 matrix of edges. Rows correspond to folds and columns correspond to regularization strengths in Lambda. You can use eFolds to identify ill-performing folds, that is, unusually low edges. Estimate the average edge over all folds for each regularization strength. e = kfoldEdge(CVMdl) e = 1×11 0.9978

0.9978

0.9978

0.9978

0.9978

0.9936

0.9777

0.9140

Determine how well the models generalize by plotting the averages of the 5-fold edge for each regularization strength. Identify the regularization strength that maximizes the 5-fold edge over the grid. figure; plot(log10(Lambda),log10(e),'-o') [~, maxEIdx] = max(e); maxLambda = Lambda(maxEIdx); hold on plot(log10(maxLambda),log10(e(maxEIdx)),'ro'); ylabel('log_{10} 5-fold edge') xlabel('log_{10} Lambda') legend('Edge','Max edge') hold off

32-2677

32

Functions

Several values of Lambda yield similarly high edges. Higher values of lambda lead to predictor variable sparsity, which is a good quality of a classifier. Choose the regularization strength that occurs just before the edge starts decreasing. LambdaFinal = Lambda(5);

Train a linear classification model using the entire data set and specify the regularization strength LambdaFinal. MdlFinal = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',LambdaFinal);

To estimate labels for new observations, pass MdlFinal and the new data to predict.

More About Classification Edge The classification edge is the weighted mean of the classification margins. One way to choose among multiple classifiers, for example to perform feature selection, is to choose the classifier that yields the greatest edge. 32-2678

kfoldEdge

Classification Margin The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class. The software defines the classification margin for binary classification as m = 2yf x . x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x). If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Classification Score For linear classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by f j(x) = xβ j + b j . For the model with regularization strength j, β j is the estimated column vector of coefficients (the model property Beta(:,j)) and b j is the estimated, scalar bias (the model property Bias(j)). The raw classification score for classifying x into the negative class is –f(x). The software classifies observations into the class that yields the positive score. If the linear classification model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores (see ScoreTransform).

See Also ClassificationLinear | ClassificationPartitionedLinear | edge | kfoldMargin | kfoldPredict Introduced in R2016a

32-2679

32

Functions

kfoldEdge Classification edge for observations not used for training

Syntax e = kfoldEdge(CVMdl) e = kfoldEdge(CVMdl,Name,Value)

Description e = kfoldEdge(CVMdl) returns the cross-validated classification edges on page 32-2688 obtained by the cross-validated, error-correcting output codes (ECOC) model composed of linear classification models CVMdl. That is, for every fold, kfoldEdge estimates the classification edge for observations that it holds out when it trains using all other observations. e contains a classification edge for each regularization strength in the linear classification models that comprise CVMdl. e = kfoldEdge(CVMdl,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, specify a decoding scheme, which folds to use for the edge calculation, or verbosity level.

Input Arguments CVMdl — Cross-validated, ECOC model composed of linear classification models ClassificationPartitionedLinearECOC model object Cross-validated, ECOC model composed of linear classification models, specified as a ClassificationPartitionedLinearECOC model object. You can create a ClassificationPartitionedLinearECOC model using fitcecoc and by: 1

Specifying any one of the cross-validation, name-value pair arguments, for example, CrossVal

2

Setting the name-value pair argument Learners to 'linear' or a linear classification model template returned by templateLinear

To obtain estimates, kfoldEdge applies the same data used to cross-validate the ECOC model (X and Y). Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in, loss-function name or function handle. 32-2680

kfoldEdge

• This table contains names and descriptions of the built-in functions, where yj is a class label for a particular binary learner (in the set {-1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes the binary losses such that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, e.g., customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction should have this form bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. For an example of passing a custom binary loss function, see “Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function” on page 32-4169. By default, if all binary learners are linear classification models using: • SVM, then BinaryLoss is 'hinge' • Logistic regression, then BinaryLoss is 'quadratic' Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased'

32-2681

32

Functions

Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2824. Example: 'Decoding','lossbased' Folds — Fold indices to use for classification-score prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices to use for classification-score prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must range from 1 through CVMdl.KFold. Example: 'Folds',[1 4 10] Data Types: single | double Mode — Edge aggregation level 'average' (default) | 'individual' Edge aggregation level, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. Value

Description

'average'

Returns classification edges averaged over all folds

'individual'

Returns classification edges for each fold

Example: 'Mode','individual' Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

32-2682

kfoldEdge

Output Arguments e — Cross-validated classification edges numeric scalar | numeric vector | numeric matrix Cross-validated classification edges on page 32-2688, returned as a numeric scalar, vector, or matrix. Let L be the number of regularization strengths in the cross-validated models (that is, L is numel(CVMdl.Trained{1}.BinaryLearners{1}.Lambda)) and F be the number of folds (stored in CVMdl.KFold). • If Mode is 'average', then e is a 1-by-L vector. e(j) is the average classification edge over all folds of the cross-validated model that uses regularization strength j. • Otherwise, e is a F-by-L matrix. e(i,j) is the classification edge for fold i of the cross-validated model that uses regularization strength j.

Examples Estimate k-Fold Cross-Validation Edge Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. For simplicity, use the label 'others' for all observations in Y that are not 'simulink', 'dsp', or 'comm'. Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others';

Cross-validate a multiclass, linear classification model. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learner','linear','CrossVal','on');

CVMdl is a ClassificationPartitionedLinearECOC model. By default, the software implements 10-fold cross validation. You can alter the number of folds using the 'KFold' name-value pair argument. Estimate the average of the out-of-fold edges. e = kfoldEdge(CVMdl) e = 0.7232

Alternatively, you can obtain the per-fold edges by specifying the name-value pair 'Mode','individual' in kfoldEdge.

32-2683

32

Functions

Feature Selection Using k-fold Edges One way to perform feature selection is to compare k-fold edges from multiple models. Based solely on this criterion, the classifier with the highest edge is the best classifier. Load the NLP data set. Preprocess the data as in “Estimate k-Fold Cross-Validation Edge” on page 322683, and orient the predictor data so that observations correspond to columns. load nlpdata Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others'; X = X';

Create these two data sets: • fullX contains all predictors. • partX contains a 1/2 of the predictors chosen at random. rng(1); % For reproducibility p = size(X,1); % Number of predictors halfPredIdx = randsample(p,ceil(0.5*p)); fullX = X; partX = X(halfPredIdx,:);

Create a linear classification model template that specifies to optimize the objective function using SpaRSA. t = templateLinear('Solver','sparsa');

Cross-validate two ECOC models composed of binary, linear classification models: one that uses the all of the predictors and one that uses half of the predictors. Indicate that observations correspond to columns. CVMdl = fitcecoc(fullX,Y,'Learners',t,'CrossVal','on',... 'ObservationsIn','columns'); PCVMdl = fitcecoc(partX,Y,'Learners',t,'CrossVal','on',... 'ObservationsIn','columns');

CVMdl and PCVMdl are ClassificationPartitionedLinearECOC models. Estimate the k-fold edge for each classifier. fullEdge = kfoldEdge(CVMdl) fullEdge = 0.3090 partEdge = kfoldEdge(PCVMdl) partEdge = 0.2617

Based on the k-fold edges, the classifier that uses all of the predictors is the better model.

Find Good Lasso Penalty Using k-fold Edge To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare k-fold edges. 32-2684

kfoldEdge

Load the NLP data set. Preprocess the data as in “Feature Selection Using k-fold Edges” on page 322683. load nlpdata Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others'; X = X'; −8

Create a set of 8 logarithmically-spaced regularization strengths from 10

1

through 10 .

Lambda = logspace(-8,1,8);

Create a linear classification model template that specifies to use logistic regression with a lasso penalty, use each of the regularization strengths, optimize the objective function using SpaRSA, and reduce the tolerance on the gradient of the objective function to 1e-8. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8);

Cross-validate an ECOC model composed of binary, linear classification models using 5-fold crossvalidation and that rng(10); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns','KFold',5) CVMdl = classreg.learning.partition.ClassificationPartitionedLinearECOC CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1×1 cvpartition] ClassNames: [comm dsp simulink others] ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedLinearECOC model. Estimate the edges for each fold and regularization strength. eFolds = kfoldEdge(CVMdl,'Mode','individual') eFolds = 5×8 0.5511 0.5257 0.5256 0.5386 0.5497

0.5520 0.5270 0.5265 0.5394 0.5551

0.5525 0.5270 0.5272 0.5405 0.5582

0.5514 0.5277 0.5276 0.5373 0.5575

0.4869 0.4801 0.4724 0.4824 0.4944

0.2933 0.2944 0.2884 0.2903 0.2939

0.1025 0.1049 0.1034 0.1009 0.1030

0.0857 0.0866 0.0866 0.0854 0.0849

eFolds is a 5-by-8 matrix of edges. Rows correspond to folds and columns correspond to regularization strengths in Lambda. You can use eFolds to identify ill-performing folds, that is, unusually low edges. Estimate the average edge over all folds for each regularization strength. 32-2685

32

Functions

e = kfoldEdge(CVMdl) e = 1×8 0.5382

0.5400

0.5411

0.5403

0.4832

0.2921

0.1029

0.0858

Determine how well the models generalize by plotting the averages of the 5-fold edge for each regularization strength. Identify the regularization strength that maximizes the 5-fold edge over the grid. figure; plot(log10(Lambda),log10(e),'-o') [~, maxEIdx] = max(e); maxLambda = Lambda(maxEIdx); hold on plot(log10(maxLambda),log10(e(maxEIdx)),'ro'); ylabel('log_{10} 5-fold edge') xlabel('log_{10} Lambda') legend('Edge','Max edge') hold off

Several values of Lambda yield similarly high edges. Greater regularization strength values lead to predictor variable sparsity, which is a good quality of a classifier. Choose the regularization strength that occurs just before the edge starts decreasing. LambdaFinal = Lambda(4);

32-2686

kfoldEdge

Train an ECOC model composed of linear classification model using the entire data set and specify the regularization strength LambdaFinal. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',LambdaFinal,'GradientTolerance',1e-8); MdlFinal = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns');

To estimate labels for new observations, pass MdlFinal and the new data to predict.

More About Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

32-2687

32

Functions

Value

Description

Score Domain

g(yj,sj)

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole. Classification Edge The classification edge is the weighted mean of the classification margins. One way to choose among multiple classifiers, for example to perform feature selection, is to choose the classifier that yields the greatest edge. Classification Margin The classification margin is, for each observation, the difference between the negative loss for the true class and the maximal negative loss among the false classes. If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) 32-2688

kfoldEdge

For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationECOC | ClassificationLinear | ClassificationPartitionedLinearECOC | edge | fitcecoc | kfoldMargin | kfoldPredict | statset Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2016a

32-2689

32

Functions

kfoldEdge Classification edge for observations not used for training

Syntax E = kfoldEdge(obj) E = kfoldEdge(obj,Name,Value)

Description E = kfoldEdge(obj) returns classification edge (average classification margin) obtained by crossvalidated classification model obj. For every fold, this method computes classification edge for in-fold observations using an ensemble trained on out-of-fold observations. E = kfoldEdge(obj,Name,Value) calculates edge with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments obj Object of class ClassificationPartitionedModel. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. folds Indices of folds ranging from 1 to obj.KFold. Use only these folds for predictions. Default: 1:obj.KFold mode Character vector or string scalar representing the meaning of the output edge: • 'average' — edge is a scalar value, the average over all folds. • 'individual' — edge is a vector of length obj.KFold with one element per fold. Default: 'average'

32-2690

kfoldEdge

Output Arguments E The average classification margin. E is a scalar or vector, depending on the setting of the mode namevalue pair.

Examples Estimate the k-fold Edge of a Classifier Compute the k-fold edge for a model trained on Fisher's iris data. Load Fisher's iris data set. load fisheriris

Train a classification tree classifier. tree = fitctree(meas,species);

Cross validate the classifier using 10-fold cross validation. cvtree = crossval(tree);

Compute the k-fold edge. edge = kfoldEdge(cvtree) edge = 0.8578

More About Edge The edge is the weighted mean value of the classification margin. The weights are class prior probabilities. If you supply additional weights, those weights are normalized to sum to the prior probabilities in the respective classes, and are then used to compute the weighted average. Margin The classification margin is the difference between the classification score for the true class and maximal classification score for the false classes. The classification margin is a column vector with the same number of rows as in the matrix X. A high value of margin indicates a more reliable prediction than a low value. Score For discriminant analysis, the score of a classification is the posterior probability of the classification. For the definition of posterior probability in discriminant analysis, see “Posterior Probability” on page 20-6. 32-2691

32

Functions

For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node. For example, consider classifying a predictor X as true when X < 0.15 or X > 0.95, and X is false otherwise. Generate 100 random points and classify them: rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')

Prune the tree: tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')

32-2692

kfoldEdge

The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations from .15 to .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false. Compute the prediction scores for the first 10 rows of X: [~,score] = predict(tree1,X(1:10)); [score X(1:10,:)] ans = 10×3 0.9059 0.9059 0 0.9059 0.9059 0 0.9059 0.9059 0.9059

0.0941 0.0941 1.0000 0.0941 0.0941 1.0000 0.0941 0.0941 0.0941

0.8147 0.9058 0.1270 0.9134 0.6324 0.0975 0.2785 0.5469 0.9575

32-2693

32

Functions

0.9059

0.0941

0.9649

Indeed, every value of X (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.91 and 0.09. The difference (score 0.09 instead of the expected .06) is due to a statistical fluctuation: there are 8 observations in X in the range (.95,1) instead of the expected 5 observations.

See Also ClassificationPartitionedEnsemble | ClassificationPartitionedModel | crossval | kfoldLoss | kfoldMargin | kfoldPredict | kfoldfun

32-2694

kfoldfun

kfoldfun Package: classreg.learning.partition Cross-validate function using cross-validated ECOC model

Syntax vals = kfoldfun(CVMdl,fun)

Description vals = kfoldfun(CVMdl,fun) cross-validates the function fun by applying fun to the data stored in the cross-validated ECOC model CVMdl. You must pass fun as a function handle.

Examples Estimate Classification Error Using Custom Loss Function Train a multiclass ECOC classifier, and then cross-validate the model using a custom k-fold loss function. Load Fisher’s iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); % Class order rng(1); % For reproducibility

Train and cross-validate an ECOC model using support vector machine (SVM) binary classifiers. Standardize the predictors using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(X,Y,'CrossVal','on','Learners',t,... 'ClassNames',classOrder);

CVMdl is a ClassificationPartitionedECOC model. By default, the software implements 10-fold cross-validation. Compute the classification error (proportion of misclassified observations) for the validation-fold observations. L = kfoldLoss(CVMdl) L = 0.0400

Examine the result when the cost of misclassifying a flower as versicolor is 10 and the cost of any other error is 1. Write a function named noversicolor that assigns a cost of 1 for general misclassification and a cost of 10 for misclassifying a flower as versicolor. 32-2695

32

Functions

If you use the live script file for this example, the noversicolor function is already included at the end of the file. Otherwise, you need to create this function at the end of your .m file or add it as a file on the MATLAB path. Compute the mean misclassification error with the noversicolor cost. foldLoss = kfoldfun(CVMdl,@noversicolor); mean(foldLoss) ans = 0.0667

This code creates the function noversicolor. function averageCost = noversicolor(CMP,Xtrain,Ytrain,Wtrain,Xtest,Ytest,Wtest) % noversicolor: Example custom cross-validation function that assigns a cost of % 10 for misclassifying versicolor irises and a cost of 1 for misclassifying % the other irises. This example function requires the fisheriris data % set. Ypredict = predict(CMP,Xtest); misclassified = not(strcmp(Ypredict,Ytest)); % Different result classifiedAsVersicolor = strcmp(Ypredict,'versicolor'); % Index of bad decisions cost = sum(misclassified) + ... 9*sum(misclassified & classifiedAsVersicolor); % Total differences averageCost = single(cost/numel(Ytest)); % Average error end

Input Arguments CVMdl — Cross-validated ECOC model ClassificationPartitionedECOC model Cross-validated ECOC model, specified as a ClassificationPartitionedECOC model. fun — Cross-validated function function handle Cross-validated function, specified as a function handle. fun has this syntax: testvals = fun(CMP,Xtrain,Ytrain,Wtrain,Xtest,Ytest,Wtest)

• CMP is a compact model stored in one element of the CVMdl.Trained property. • Xtrain is the training matrix of predictor values. • Ytrain is the training array of response values. • Wtrain is the set of training weights for observations. • Xtest and Ytest are the validation data, with associated weights Wtest. • The returned value testvals must have the same size across all folds. Data Types: function_handle

Output Arguments vals — Cross-validation results numeric matrix 32-2696

kfoldfun

Cross-validation results, returned as a numeric matrix. vals corresponds to the arrays of the testvals output, concatenated vertically over all the folds. For example, if testvals from every fold is a numeric vector of length n, kfoldfun returns a KFold-by-n numeric matrix with one row per fold.

See Also ClassificationECOC | ClassificationPartitionedECOC | crossval | fitcecoc | kfoldEdge | kfoldLoss | kfoldMargin | kfoldPredict Introduced in R2014b

32-2697

32

Functions

kfoldfun Cross validate function

Syntax vals = kfoldfun(CVMdl,fun)

Description vals = kfoldfun(CVMdl,fun) cross validates the function fun by applying fun to the data stored in the cross-validated model CVMdl. You must pass fun as a function handle.

Input Arguments CVMdl — Cross-validated model ClassificationPartitionedECOC model | ClassificationPartitionedEnsemble model | ClassificationPartitionedModel model Cross-validated model, specified as a ClassificationPartitionedECOC model, ClassificationPartitionedEnsemble model, or a ClassificationPartitionedModel model. fun — Cross-validated function function handle Cross-validated function, specified as a function handle. fun has the syntax testvals = fun(CMP,Xtrain,Ytrain,Wtrain,Xtest,Ytest,Wtest)

• CMP is a compact model stored in one element of the CVMdl.Trained property. • Xtrain is the training matrix of predictor values. • Ytrain is the training array of response values. • Wtrain are the training weights for observations. • Xtest and Ytest are the test data, with associated weights Wtest. • The returned value testvals needs the same size across all folds. Data Types: function_handle

Output Arguments vals — Cross-validation results numeric matrix Cross-validation results, returned as an numeric matrix. vals is the arrays of testvals output, concatenated vertically over all folds. For example, if testvals from every fold is a numeric vector of length N, kfoldfun returns a KFold-by-N numeric matrix with one row per fold. Data Types: double 32-2698

kfoldfun

Examples Estimate Classification Error Using a Custom Loss Function Train a classification tree classifier, and then cross validate it using a custom k-fold loss function. Load Fisher’s iris data set. load fisheriris

Train a classification tree classifier. Mdl = fitctree(meas,species);

Mdl is a ClassificationTree model. Cross validate Mdl using the default 10-fold cross validation. Compute the classification error (proportion of misclassified observations) for the out-of-fold observations. rng(1); % For reproducibility CVMdl = crossval(Mdl); L = kfoldLoss(CVMdl) L = 0.0467

Examine the result when the cost of misclassifying a flower as 'versicolor' is 10, and any other error is 1. Write a function called noversicolor.m that attributes a cost of 1 for misclassification, but 10 for misclassifying a flower as versicolor, and save it on your MATLAB path. function averageCost = noversicolor(CMP,Xtrain,Ytrain,Wtrain,Xtest,Ytest,Wtest) %noversicolor Example custom cross-validation function % Attributes a cost of 10 for misclassifying versicolor irises, and 1 for % the other irises. This example function requires the |fisheriris| data % set. Ypredict = predict(CMP,Xtest); misclassified = not(strcmp(Ypredict,Ytest)); % Different result classifiedAsVersicolor = strcmp(Ypredict,'versicolor'); % Index of bad decisions cost = sum(misclassified) + ... 9*sum(misclassified & classifiedAsVersicolor); % Total differences averageCost = cost/numel(Ytest); % Average error end

Compute the mean misclassification error with the noversicolor cost. mean(kfoldfun(CVMdl,@noversicolor))

32-2699

32

Functions

ans = 0.2267

See Also ClassificationPartitionedECOC | ClassificationPartitionedModel | crossval | crossval | kfoldEdge | kfoldLoss | kfoldMargin | kfoldPredict

32-2700

kfoldfun

kfoldfun Cross validate function

Syntax vals = kfoldfun(obj,fun)

Description vals = kfoldfun(obj,fun) cross validates the function fun by applying fun to the data stored in the cross-validated model obj. You must pass fun as a function handle.

Input Arguments obj Object of class RegressionPartitionedModel or RegressionPartitionedEnsemble. Create obj with fitrtree or fitrensemble along with one of the cross-validation options: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Alternatively, create obj from a regression tree or regression ensemble with crossval. fun A function handle for a cross-validation function. fun has the syntax testvals = fun(CMP,Xtrain,Ytrain,Wtrain,Xtest,Ytest,Wtest)

• CMP is a compact model stored in one element of the obj.Trained property. • Xtrain is the training matrix of predictor values. • Ytrain is the training array of response values. • Wtrain are the training weights for observations. • Xtest and Ytest are the test data, with associated weights Wtest. • The returned value testvals must have the same size across all folds.

Output Arguments vals The arrays of testvals output, concatenated vertically over all folds. For example, if testvals from every fold is a numeric vector of length N, kfoldfun returns a KFold-by-N numeric matrix with one row per fold.

Examples Cross validate a regression tree, and obtain the mean squared error (see kfoldLoss): load imports-85 t = fitrtree(X(:,[4 5]),X(:,16),...

32-2701

32

Functions

'predictornames',{'length' 'width'},... 'responsename','price'); cv = crossval(t); L = kfoldLoss(cv) L = 1.5489e+007

Examine the result of simple averaging of responses instead of using predictions: f = @(cmp,Xtrain,Ytrain,Wtrain,Xtest,Ytest,Wtest)... mean((Ytest-mean(Ytrain)).^2) mean(kfoldfun(cv,f)) ans = 6.3497e+007

See Also RegressionPartitionedEnsemble | RegressionPartitionedModel | crossval | fitrtree | kfoldLoss | kfoldPredict

32-2702

kfoldLoss

kfoldLoss Package: classreg.learning.partition Classification loss for cross-validated ECOC model

Syntax loss = kfoldLoss(CVMdl) loss = kfoldLoss(CVMdl,Name,Value)

Description loss = kfoldLoss(CVMdl) returns the classification loss obtained by the cross-validated ECOC model (ClassificationPartitionedECOC) CVMdl. For every fold, kfoldLoss computes the classification loss for validation-fold observations using a model trained on training-fold observations. CVMdl.X contains both sets of observations. loss = kfoldLoss(CVMdl,Name,Value) returns the classification loss with additional options specified by one or more name-value pair arguments. For example, specify the number of folds, decoding scheme, or verbosity level.

Examples Determine k-Fold Cross-Validation Loss Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); % Class order rng(1); % For reproducibility

Train and cross-validate an ECOC model using support vector machine (SVM) binary classifiers. Standardize the predictors using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(X,Y,'CrossVal','on','Learners',t,'ClassNames',classOrder);

CVMdl is a ClassificationPartitionedECOC model. By default, the software implements 10-fold cross-validation. You can specify a different number of folds using the 'KFold' name-value pair argument. Estimate the average classification error. L = kfoldLoss(CVMdl) L = 0.0400

The average classification error for the folds is 4%. 32-2703

32

Functions

Alternatively, you can obtain the per-fold losses by specifying the name-value pair 'Mode','individual' in kfoldLoss.

Display Individual Losses for Each Cross-Validation Fold The classification loss is a measure of classifier quality. To determine which folds perform poorly, display the losses for each fold. Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); rng(1); % For reproducibility

Train an ECOC model using SVM binary classifiers. Use 8-fold cross-validation, standardize the predictors using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(X,Y,'KFold',8,'Learners',t,'ClassNames',classOrder);

Estimate the average classification loss across all folds and the losses for each fold. loss = kfoldLoss(CVMdl) loss = 0.0333 losses = kfoldLoss(CVMdl,'Mode','individual') losses = 8×1 0.0556 0.0526 0.1579 0 0 0 0 0

The third fold misclassifies a much higher percentage of observations than any other fold. Return the average classification loss for the folds that perform well by specifying the 'Folds' name-value pair argument. newloss = kfoldLoss(CVMdl,'Folds',[1:2 4:8]) newloss = 0.0153

The total classification loss decreases by approximately half its original size. Consider adjusting parameters of the binary classifiers or the coding design to see if performance for all folds improves. 32-2704

kfoldLoss

Determine ECOC Model Quality Using Custom Cross-Validation Loss In addition to knowing whether a model generally classifies observations correctly, you can determine how well the model classifies an observation into its predicted class. One way to determine this type of model quality is to pass a custom loss function to kfoldLoss. Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y) % Class order classOrder = 3x1 categorical setosa versicolor virginica rng(1) % For reproducibility

Train and cross-validate an ECOC model using SVM binary classifiers. Standardize the predictors using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(X,Y,'CrossVal','on','Learners',t,'ClassNames',classOrder);

CVMdl is a ClassificationPartitionedECOC model. By default, the software implements 10-fold cross-validation. You can specify a different number of folds using the 'KFold' name-value pair argument. Create a custom function that takes the minimal loss for each observation, then averages the minimal losses for all observations. S corresponds to the NegLoss output of kfoldPredict. lossfun = @(~,S,~,~)mean(min(-S,[],2));

Compute the cross-validated custom loss. kfoldLoss(CVMdl,'LossFun',lossfun) ans = 0.0101

The average minimal binary loss for the validation-fold observations is 0.0101.

Input Arguments CVMdl — Cross-validated ECOC model ClassificationPartitionedECOC model Cross-validated ECOC model, specified as a ClassificationPartitionedECOC model. You can create a ClassificationPartitionedECOC model in two ways: 32-2705

32

Functions

• Pass a trained ECOC model (ClassificationECOC) to crossval. • Train an ECOC model using fitcecoc and specify any one of these cross-validation name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldLoss(CVMdl,'Folds',[1 3 5]) specifies to use only the first, third, and fifth folds to calculate the classification loss. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in loss function name or function handle. • This table describes the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses so that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, for example customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction has this form: bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. 32-2706

kfoldLoss

• K is the number of classes. • L is the number of binary learners. For an example of passing a custom binary loss function, see “Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function” on page 32-4169. The default BinaryLoss value depends on the score ranges returned by the binary learners. This table describes some default BinaryLoss values based on the given assumptions. Assumption

Default Value

All binary learners are SVMs or either linear or kernel classification models of SVM learners.

'hinge'

All binary learners are ensembles trained by AdaboostM1 or GentleBoost.

'exponential'

All binary learners are ensembles trained by LogitBoost.

'binodeviance'

All binary learners are linear or kernel classification models of logistic regression learners. Or, you specify to predict class posterior probabilities by setting 'FitPosterior',true in fitcecoc.

'quadratic'

To check the default value, use dot notation to display the BinaryLoss property of the trained model at the command line. Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2824. Example: 'Decoding','lossbased' Folds — Fold indices for prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices for prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must be within the range from 1 to CVMdl.KFold. The software uses only the folds specified in Folds for prediction. Example: 'Folds',[1 4 10] Data Types: single | double LossFun — Loss function 'classiferror' (default) | function handle Loss function, specified as the comma-separated pair consisting of 'LossFun' and 'classiferror' or a function handle. 32-2707

32

Functions

• Specify the built-in function 'classiferror'. In this case, the loss function is the classification error on page 32-2709. • Or, specify your own function using function handle notation. Assume that n is the number of observations in the training data (CVMdl.NumObservations) and K is the number of classes (numel(CVMdl.ClassNames)). Your function needs the signature lossvalue = lossfun(C,S,W,Cost), where: • The output argument lossvalue is a scalar. • You specify the function name (lossfun). • C is an n-by-K logical matrix with rows indicating the class to which the corresponding observation belongs. The column order corresponds to the class order in CVMdl.ClassNames. Construct C by setting C(p,q) = 1 if observation p is in class q, for each row. Set every element of row p to 0. • S is an n-by-K numeric matrix of negated loss values for the classes. Each row corresponds to an observation. The column order corresponds to the class order in CVMdl.ClassNames. The input S resembles the output argument NegLoss of kfoldPredict. • W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes its elements to sum to 1. • Cost is a K-by-K numeric matrix of misclassification costs. For example, Cost = ones(K) – eye(K) specifies a cost of 0 for correct classification and 1 for misclassification. Specify your function using 'LossFun',@lossfun. Data Types: char | string | function_handle Mode — Aggregation level for output 'average' (default) | 'individual' Aggregation level for the output, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. This table describes the values. Value

Description

'average'

The output is a scalar average over all folds.

'individual'

The output is a vector of length k containing one value per fold, where k is the number of folds.

Example: 'Mode','individual' Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. 32-2708

kfoldLoss

• Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments loss — Classification loss numeric scalar | numeric column vector Classification loss, returned as a numeric scalar or numeric column vector. If Mode is 'average', then loss is the average classification loss over all folds. Otherwise, loss is a k-by-1 numeric column vector containing the classification loss for each fold, where k is the number of folds.

More About Classification Error The classification error is a binary classification error measure that has the form n



L=

j=1 n



w je j

j=1

, wj

where: • wj is the weight for observation j. The software renormalizes the weights to sum to 1. • ej = 1 if the predicted class of observation j differs from its true class, and 0 otherwise. In other words, the classification error is the proportion of observations misclassified by the classifier. Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). 32-2709

32

Functions

• sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141.

32-2710

kfoldLoss

[2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationECOC | ClassificationPartitionedECOC | fitcecoc | kfoldPredict | loss | statset Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2014b

32-2711

32

Functions

kfoldLoss Classification loss for observations not used for training

Syntax L = kfoldLoss(ens) L = kfoldLoss(ens,Name,Value)

Description L = kfoldLoss(ens) returns loss obtained by cross-validated classification model ens. For every fold, this method computes classification loss for in-fold observations using a model trained on out-offold observations. L = kfoldLoss(ens,Name,Value) calculates loss with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments ens Object of class ClassificationPartitionedEnsemble. Create ens with fitcensemble along with one of the cross-validation options: 'crossval', 'kfold', 'holdout', 'leaveout', or 'cvpartition'. Alternatively, create ens from a classification ensemble with crossval. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. folds Indices of folds ranging from 1 to ens.KFold. Use only these folds for predictions. Default: 1:ens.KFold lossfun Loss function, specified as the comma-separated pair consisting of 'LossFun' and a built-in, lossfunction name or function handle. • The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar.

32-2712

Value

Description

'binodeviance'

Binomial deviance

'classiferror'

Classification error

kfoldLoss

Value

Description

'exponential'

Exponential

'hinge'

Hinge

'logit'

Logistic

'mincost'

Minimal expected misclassification cost (for classification scores that are posterior probabilities)

'quadratic'

Quadratic

'mincost' is appropriate for classification scores that are posterior probabilities. • Bagged and subspace ensembles return posterior probabilities by default (ens.Method is 'Bag' or 'Subspace'). • If the ensemble method is 'AdaBoostM1', 'AdaBoostM2', GentleBoost, or 'LogitBoost', then, to use posterior probabilities as classification scores, you must specify the double-logit score transform by entering ens.ScoreTransform = 'doublelogit';

• For all other ensemble methods, the software does not support posterior probabilities as classification scores. • Specify your own function using function handle notation. Suppose that n be the number of observations in X and K be the number of distinct classes (numel(ens.ClassNames), ens is the input model). Your function must have this signature lossvalue = lossfun(C,S,W,Cost)

where: • The output argument lossvalue is a scalar. • You choose the function name (lossfun). • C is an n-by-K logical matrix with rows indicating which class the corresponding observation belongs. The column order corresponds to the class order in ens.ClassNames. Construct C by setting C(p,q) = 1 if observation p is in class q, for each row. Set all other elements of row p to 0. • S is an n-by-K numeric matrix of classification scores. The column order corresponds to the class order in ens.ClassNames. S is a matrix of classification scores, similar to the output of predict. • W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes them to sum to 1. • Cost is a K-by-K numeric matrix of misclassification costs. For example, Cost = ones(K) eye(K) specifies a cost of 0 for correct classification, and 1 for misclassification. Specify your function using 'LossFun',@lossfun. For more details on loss functions, see “Classification Loss” on page 32-2714. Default: 'classiferror' 32-2713

32

Functions

mode A character vector or string scalar for determining the output of kfoldLoss: • 'average' — L is a scalar, the loss averaged over all folds. • 'individual' — L is a vector of length ens.KFold, where each entry is the loss for a fold. • 'cumulative' — L is a vector in which element J is obtained by using learners 1:J from the input list of learners. Default: 'average'

Output Arguments L Loss, by default the fraction of misclassified data. L can be a vector, and can mean different things, depending on the name-value pair settings.

Examples Estimate Cross-Validated Classification Error Load the ionosphere data set. load ionosphere

Train a classification ensemble of 100 decision trees using AdaBoostM1. Specify tree stumps as the weak learners. t = templateTree('MaxNumSplits',1); ens = fitcensemble(X,Y,'Method','AdaBoostM1','Learners',t);

Cross-validate the ensemble using 10-fold cross-validation. cvens = crossval(ens);

Estimate the cross-validated classification error. L = kfoldLoss(cvens) L = 0.0655

More About Classification Loss Classification loss functions measure the predictive inaccuracy of classification models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model. Consider the following scenario. • L is the weighted average classification loss. 32-2714

kfoldLoss

• n is the sample size. • For binary classification: • yj is the observed class label. The software codes it as –1 or 1, indicating the negative or positive class, respectively. • f(Xj) is the raw classification score for observation (row) j of the predictor data X. • mj = yjf(Xj) is the classification score for classifying observation j into the class corresponding to yj. Positive values of mj indicate correct classification and do not contribute much to the average loss. Negative values of mj indicate incorrect classification and contribute significantly to the average loss. • For algorithms that support multiclass classification (that is, K ≥ 3): • yj* is a vector of K – 1 zeros, with 1 in the position corresponding to the true, observed class yj. For example, if the true class of the second observation is the third class and K = 4, then y*2 = [0 0 1 0]′. The order of the classes corresponds to the order in the ClassNames property of the input model. • f(Xj) is the length K vector of class scores for observation j of the predictor data X. The order of the scores corresponds to the order of the classes in the ClassNames property of the input model. • mj = yj*′f(Xj). Therefore, mj is the scalar classification score that the model predicts for the true, observed class. • The weight for observation j is wj. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore, n



j=1

w j = 1.

Given this scenario, the following table describes the supported loss functions that you can specify by using the 'LossFun' name-value pair argument. Loss Function

Value of LossFun

Binomial deviance

'binodeviance'

Equation L=

n



j=1

Exponential loss

'exponential'

L=

n



j=1

Classification error

'classiferror'

L=

n



j=1

w jlog 1 + exp −2m j

.

w jexp −m j . w jI y j ≠ y j .

It is the weighted fraction of misclassified observations where y j is the class label corresponding to the class with the maximal posterior probability. I{x} is the indicator function.

32-2715

32

Functions

Loss Function

Value of LossFun

Hinge loss

'hinge'

Equation n

L=



w jmax 0, 1 − m j .

j=1

Logit loss

'logit'

L=

n



j=1

Minimal cost

'mincost'

w jlog 1 + exp −m j .

Minimal cost. The software computes the weighted minimal cost using this procedure for observations j = 1,...,n. 1

Estimate the 1-by-K vector of expected classification costs for observation j: γ j = f X j ′C . f(Xj) is the column vector of class posterior probabilities for binary and multiclass classification. C is the cost matrix that the input model stores in the Cost property.

2

For observation j, predict the class label corresponding to the minimum expected classification cost: yj=

3

min

j = 1, ..., K

γj .

Using C, identify the cost incurred (cj) for making the prediction.

The weighted, average, minimum cost loss is L=

n



j=1

Quadratic loss

'quadratic'

L=

n



j=1

w jc j . 2

wj 1 − mj .

This figure compares the loss functions (except 'mincost') for one observation over m. Some functions are normalized to pass through [0,1].

32-2716

kfoldLoss

See Also crossval | kfoldEdge | kfoldMargin | kfoldPredict | kfoldfun

32-2717

32

Functions

kfoldLoss Package: classreg.learning.partition Classification loss for cross-validated kernel classification model

Syntax loss = kfoldLoss(CVMdl) loss = kfoldLoss(CVMdl,Name,Value)

Description loss = kfoldLoss(CVMdl) returns the classification loss on page 32-2722 obtained by the crossvalidated, binary kernel model (ClassificationPartitionedKernel) CVMdl. For every fold, kfoldLoss computes the classification loss for validation-fold observations using a model trained on training-fold observations. By default, kfoldLoss returns the classification error. loss = kfoldLoss(CVMdl,Name,Value) returns the classification loss with additional options specified by one or more name-value pair arguments. For example, specify the classification loss function, number of folds, or aggregation level.

Examples Estimate k-Fold Cross-Validation Classification Error Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g'). load ionosphere

Cross-validate a binary kernel classification model using the data. CVMdl = fitckernel(X,Y,'Crossval','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernel CrossValidatedModel: 'Kernel' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

32-2718

kfoldLoss

CVMdl is a ClassificationPartitionedKernel model. By default, the software implements 10fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Estimate the cross-validated classification loss. By default, the software computes the classification error. loss = kfoldLoss(CVMdl) loss = 0.0940

Alternatively, you can obtain the per-fold classification errors by specifying the name-value pair 'Mode','individual' in kfoldLoss.

Specify Custom Classification Loss Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g'). load ionosphere

Cross-validate a binary kernel classification model using the data. CVMdl = fitckernel(X,Y,'Crossval','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernel CrossValidatedModel: 'Kernel' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedKernel model. By default, the software implements 10fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Create an anonymous function that measures linear loss, that is, L=

∑ j − w jy j f j . ∑ jwj

w j is the weight for observation j, y j is the response j (–1 for the negative class and 1 otherwise), and f j is the raw classification score of observation j. linearloss = @(C,S,W,Cost)sum(-W.*sum(S.*C,2))/sum(W);

Custom loss functions must be written in a particular form. For rules on writing a custom loss function, see the 'LossFun' name-value pair argument. 32-2719

32

Functions

Estimate the cross-validated classification loss using the linear loss function. loss = kfoldLoss(CVMdl,'LossFun',linearloss) loss = -0.7792

Input Arguments CVMdl — Cross-validated, binary kernel classification model ClassificationPartitionedKernel model object Cross-validated, binary kernel classification model, specified as a ClassificationPartitionedKernel model object. You can create a ClassificationPartitionedKernel model by using fitckernel and specifying any one of the cross-validation name-value pair arguments. To obtain estimates, kfoldLoss applies the same data used to cross-validate the kernel classification model (X and Y). Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldLoss(CVMdl,'Folds',[1 3 5]) specifies to use only the first, third, and fifth folds to calculate the classification loss. Folds — Fold indices for prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices for prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must be within the range from 1 to CVMdl.KFold. The software uses only the folds specified in Folds for prediction. Example: 'Folds',[1 4 10] Data Types: single | double LossFun — Loss function 'classiferror' (default) | 'binodeviance' | 'exponential' | 'hinge' | 'logit' | 'mincost' | 'quadratic' | function handle Loss function, specified as the comma-separated pair consisting of 'LossFun' and a built-in loss function name or a function handle. • This table lists the available loss functions. Specify one using its corresponding value.

32-2720

Value

Description

'binodeviance'

Binomial deviance

'classiferror'

Classification error

kfoldLoss

Value

Description

'exponential'

Exponential

'hinge'

Hinge

'logit'

Logistic

'mincost'

Minimal expected misclassification cost (for classification scores that are posterior probabilities)

'quadratic'

Quadratic

'mincost' is appropriate for classification scores that are posterior probabilities. For kernel classification models, logistic regression learners return posterior probabilities as classification scores by default, but SVM learners do not (see kfoldPredict). • Specify your own function by using function handle notation. Assume that n is the number of observations in X, and K is the number of distinct classes (numel(CVMdl.ClassNames), where CVMdl is the input model). Your function must have this signature: lossvalue = lossfun(C,S,W,Cost)

• The output argument lossvalue is a scalar. • You specify the function name (lossfun). • C is an n-by-K logical matrix with rows indicating the class to which the corresponding observation belongs. The column order corresponds to the class order in CVMdl.ClassNames. Construct C by setting C(p,q) = 1, if observation p is in class q, for each row. Set all other elements of row p to 0. • S is an n-by-K numeric matrix of classification scores. The column order corresponds to the class order in CVMdl.ClassNames. S is a matrix of classification scores, similar to the output of kfoldPredict. • W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes the weights to sum to 1. • Cost is a K-by-K numeric matrix of misclassification costs. For example, Cost = ones(K) – eye(K) specifies a cost of 0 for correct classification, and 1 for misclassification. Example: 'LossFun',@lossfun Data Types: char | string | function_handle Mode — Aggregation level for output 'average' (default) | 'individual' Aggregation level for the output, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. This table describes the values. Value

Description

'average'

The output is a scalar average over all folds.

32-2721

32

Functions

Value

Description

'individual'

The output is a vector of length k containing one value per fold, where k is the number of folds.

Example: 'Mode','individual'

Output Arguments loss — Classification loss numeric scalar | numeric column vector Classification loss on page 32-2722, returned as a numeric scalar or numeric column vector. If Mode is 'average', then loss is the average classification loss over all folds. Otherwise, loss is a k-by-1 numeric column vector containing the classification loss for each fold, where k is the number of folds.

More About Classification Loss Classification loss functions measure the predictive inaccuracy of classification models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model. Suppose the following: • L is the weighted average classification loss. • n is the sample size. • yj is the observed class label. The software codes it as –1 or 1, indicating the negative or positive class, respectively. • f(Xj) is the raw classification score for the transformed observation (row) j of the predictor data X using feature expansion. • mj = yjf(Xj) is the classification score for classifying observation j into the class corresponding to yj. Positive values of mj indicate correct classification and do not contribute much to the average loss. Negative values of mj indicate incorrect classification and contribute to the average loss. • The weight for observation j is wj. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so that they sum to 1. Therefore, n



j=1

w j = 1.

This table describes the supported loss functions that you can specify by using the 'LossFun' namevalue pair argument. Loss Function

Value of LossFun

Binomial deviance

'binodeviance'

Equation L=

n



j=1

32-2722

w jlog 1 + exp −2m j

.

kfoldLoss

Loss Function

Value of LossFun

Exponential loss

'exponential'

Equation L=

n



j=1

Classification error

'classiferror'

L=

n



j=1

w jexp −m j . w jI y j ≠ y j .

The classification error is the weighted fraction of misclassified observations where y j is the class label corresponding to the class with the maximal posterior probability. I{x} is the indicator function. Hinge loss

'hinge'

n

L=



w jmax 0, 1 − m j .

j=1

Logit loss

'logit'

L=

n



j=1

Minimal cost

'mincost'

w jlog 1 + exp −m j .

The software computes the weighted minimal cost using this procedure for observations j = 1,...,n. 1

Estimate the 1-by-K vector of expected classification costs for observation j: γ j = f X j ′C . f(Xj) is the column vector of class posterior probabilities. C is the cost matrix that the input model stores in the Cost property.

2

For observation j, predict the class label corresponding to the minimum expected classification cost: yj=

3

min

j = 1, ..., K

γj .

Using C, identify the cost incurred (cj) for making the prediction.

The weighted, average, minimum cost loss is L=

n



j=1

Quadratic loss

'quadratic'

L=

n



j=1

w jc j . 2

wj 1 − mj .

This figure compares the loss functions (except minimal cost) for one observation over m. Some functions are normalized to pass through [0,1].

32-2723

32

Functions

See Also ClassificationPartitionedKernel | fitckernel Introduced in R2018b

32-2724

kfoldLoss

kfoldLoss Package: classreg.learning.partition Classification loss for cross-validated kernel ECOC model

Syntax loss = kfoldLoss(CVMdl) loss = kfoldLoss(CVMdl,Name,Value)

Description loss = kfoldLoss(CVMdl) returns the classification loss obtained by the cross-validated kernel ECOC model (ClassificationPartitionedKernelECOC) CVMdl. For every fold, kfoldLoss computes the classification loss for validation-fold observations using a model trained on training-fold observations. kfoldLoss applies the same data used to create CVMdl (see fitcecoc). By default, kfoldLoss returns the classification error on page 32-2730. loss = kfoldLoss(CVMdl,Name,Value) returns the classification loss with additional options specified by one or more name-value pair arguments. For example, specify the classification loss function, number of folds, decoding scheme, or verbosity level.

Examples Estimate k-Fold Cross-Validation Classification Error Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Cross-validate an ECOC model composed of kernel binary learners. CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' Properties, Methods

32-2725

32

Functions

CVMdl is a ClassificationPartitionedKernelECOC model. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Estimate the cross-validated classification loss. By default, the software computes the classification error. loss = kfoldLoss(CVMdl) loss = 0.0333

Alternatively, you can obtain the per-fold classification errors by specifying the name-value pair 'Mode','individual' in kfoldLoss.

Determine Model Quality Using Custom Cross-Validation Loss In addition to knowing whether a model generally classifies observations correctly, you can determine how well the model classifies an observation into its predicted class. One way to determine this type of model quality is to pass a custom loss function to kfoldLoss. Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Cross-validate an ECOC model composed of kernel binary learners. rng(1) % For reproducibility CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedKernelECOC model. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Create a custom function that takes the minimal loss for each observation, then averages the minimal losses for all observations. S corresponds to the NegLoss output of kfoldPredict. lossfun = @(~,S,~,~)mean(min(-S,[],2));

32-2726

kfoldLoss

Compute the cross-validated custom loss. kfoldLoss(CVMdl,'LossFun',lossfun) ans = 0.0199

The average minimal binary loss for the validation-fold observations is about 0.02.

Input Arguments CVMdl — Cross-validated kernel ECOC model ClassificationPartitionedKernelECOC model Cross-validated kernel ECOC model, specified as a ClassificationPartitionedKernelECOC model. You can create a ClassificationPartitionedKernelECOC model by training an ECOC model using fitcecoc and specifying these name-value pair arguments: • 'Learners'– Set the value to 'kernel', a template object returned by templateKernel, or a cell array of such template objects. • One of the arguments 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldLoss(CVMdl,'Folds',[1 3 5]) specifies to use only the first, third, and fifth folds to calculate the classification loss. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in loss function name or function handle. • This table contains names and descriptions of the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

32-2727

32

Functions

Value

Description

Score Domain

g(yj,sj)

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, for example, customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction has this form: bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. By default, if all binary learners are kernel classification models using SVM, then BinaryLoss is 'hinge'. If all binary learners are kernel classification models using logistic regression, then BinaryLoss is 'quadratic'. Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2730. Example: 'Decoding','lossbased' Folds — Fold indices for prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices for prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must be within the range from 1 to CVMdl.KFold. The software uses only the folds specified in Folds for prediction. Example: 'Folds',[1 4 10] Data Types: single | double LossFun — Loss function 'classiferror' (default) | function handle 32-2728

kfoldLoss

Loss function, specified as the comma-separated pair consisting of 'LossFun' and 'classiferror' or a function handle. • Specify the built-in function 'classiferror'. In this case, the loss function is the classification error on page 32-2730. • Or, specify your own function using function handle notation. Assume that n is the number of observations in the training data (CVMdl.NumObservations) and K is the number of classes (numel(CVMdl.ClassNames)). Your function needs the signature lossvalue = lossfun(C,S,W,Cost), where: • The output argument lossvalue is a scalar. • You specify the function name (lossfun). • C is an n-by-K logical matrix with rows indicating the class to which the corresponding observation belongs. The column order corresponds to the class order in CVMdl.ClassNames. Construct C by setting C(p,q) = 1 if observation p is in class q, for each row. Set every element of row p to 0. • S is an n-by-K numeric matrix of negated loss values for the classes. Each row corresponds to an observation. The column order corresponds to the class order in CVMdl.ClassNames. The input S resembles the output argument NegLoss of kfoldPredict. • W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes its elements to sum to 1. • Cost is a K-by-K numeric matrix of misclassification costs. For example, Cost = ones(K) – eye(K) specifies a cost of 0 for correct classification and 1 for misclassification. Specify your function using 'LossFun',@lossfun. Data Types: char | string | function_handle Mode — Aggregation level for output 'average' (default) | 'individual' Aggregation level for the output, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. This table describes the values. Value

Description

'average'

The output is a scalar average over all folds.

'individual'

The output is a vector of length k containing one value per fold, where k is the number of folds.

Example: 'Mode','individual' Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: 32-2729

32

Functions

• You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments loss — Classification loss numeric scalar | numeric column vector Classification loss, returned as a numeric scalar or numeric column vector. If Mode is 'average', then loss is the average classification loss over all folds. Otherwise, loss is a k-by-1 numeric column vector containing the classification loss for each fold, where k is the number of folds.

More About Classification Error The classification error is a binary classification error measure that has the form n



L=

j=1 n



w je j

j=1

, wj

where: • wj is the weight for observation j. The software renormalizes the weights to sum to 1. • ej = 1 if the predicted class of observation j differs from its true class, and 0 otherwise. In other words, the classification error is the proportion of observations misclassified by the classifier. Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). 32-2730

kfoldLoss

• sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141.

32-2731

32

Functions

[2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

See Also ClassificationPartitionedKernelECOC | fitcecoc Introduced in R2018b

32-2732

kfoldLoss

kfoldLoss Classification loss for observations not used in training

Syntax L = kfoldLoss(CVMdl) L = kfoldLoss(CVMdl,Name,Value)

Description L = kfoldLoss(CVMdl) returns the cross-validated classification losses on page 32-2739 obtained by the cross-validated, binary, linear classification model CVMdl. That is, for every fold, kfoldLoss estimates the classification loss for observations that it holds out when it trains using all other observations. L contains a classification loss for each regularization strength in the linear classification models that compose CVMdl. L = kfoldLoss(CVMdl,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, indicate which folds to use for the loss calculation or specify the classification-loss function.

Input Arguments CVMdl — Cross-validated, binary, linear classification model ClassificationPartitionedLinear model object Cross-validated, binary, linear classification model, specified as a ClassificationPartitionedLinear model object. You can create a ClassificationPartitionedLinear model using fitclinear and specifying any one of the cross-validation, name-value pair arguments, for example, CrossVal. To obtain estimates, kfoldLoss applies the same data used to cross-validate the linear classification model (X and Y). Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Folds — Fold indices to use for classification-score prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices to use for classification-score prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must range from 1 through CVMdl.KFold. Example: 'Folds',[1 4 10] Data Types: single | double 32-2733

32

Functions

LossFun — Loss function 'classiferror' (default) | 'binodeviance' | 'exponential' | 'hinge' | 'logit' | 'mincost' | 'quadratic' | function handle Loss function, specified as the comma-separated pair consisting of 'LossFun' and a built-in, lossfunction name or function handle. • The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar. Value

Description

'binodeviance'

Binomial deviance

'classiferror'

Classification error

'exponential'

Exponential

'hinge'

Hinge

'logit'

Logistic

'mincost'

Minimal expected misclassification cost (for classification scores that are posterior probabilities)

'quadratic'

Quadratic

'mincost' is appropriate for classification scores that are posterior probabilities. For linear classification models, logistic regression learners return posterior probabilities as classification scores by default, but SVM learners do not (see predict). • Specify your own function using function handle notation. Let n be the number of observations in X and K be the number of distinct classes (numel(Mdl.ClassNames), Mdl is the input model). Your function must have this signature lossvalue = lossfun(C,S,W,Cost)

where: • The output argument lossvalue is a scalar. • You choose the function name (lossfun). • C is an n-by-K logical matrix with rows indicating which class the corresponding observation belongs. The column order corresponds to the class order in Mdl.ClassNames. Construct C by setting C(p,q) = 1 if observation p is in class q, for each row. Set all other elements of row p to 0. • S is an n-by-K numeric matrix of classification scores. The column order corresponds to the class order in Mdl.ClassNames. S is a matrix of classification scores, similar to the output of predict. • W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes them to sum to 1. • Cost is a K-by-K numeric matrix of misclassification costs. For example, Cost = ones(K) eye(K) specifies a cost of 0 for correct classification, and 1 for misclassification. Specify your function using 'LossFun',@lossfun. 32-2734

kfoldLoss

Data Types: char | string | function_handle Mode — Loss aggregation level 'average' (default) | 'individual' Loss aggregation level, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. Value

Description

'average'

Returns losses averaged over all folds

'individual'

Returns losses for each fold

Example: 'Mode','individual'

Output Arguments L — Cross-validated classification losses numeric scalar | numeric vector | numeric matrix Cross-validated classification losses on page 32-2739, returned as a numeric scalar, vector, or matrix. The interpretation of L depends on LossFun. Let R be the number of regularizations strengths is the cross-validated models (stored in numel(CVMdl.Trained{1}.Lambda)) and F be the number of folds (stored in CVMdl.KFold). • If Mode is 'average', then L is a 1-by-R vector. L(j) is the average classification loss over all folds of the cross-validated model that uses regularization strength j. • Otherwise, L is an F-by-R matrix. L(i,j) is the classification loss for fold i of the cross-validated model that uses regularization strength j. To estimate L, kfoldLoss uses the data that created CVMdl (see X and Y).

Examples Estimate k-Fold Cross-Validation Classification Error Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Cross-validate a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. 32-2735

32

Functions

rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'CrossVal','on');

CVMdl is a ClassificationPartitionedLinear model. By default, the software implements 10fold cross validation. You can alter the number of folds using the 'KFold' name-value pair argument. Estimate the average of the out-of-fold, classification error rates. ce = kfoldLoss(CVMdl) ce = 7.6017e-04

Alternatively, you can obtain the per-fold classification error rates by specifying the name-value pair 'Mode','individual' in kfoldLoss.

Specify Custom Classification Loss Load the NLP data set. Preprocess the data as in “Estimate k-Fold Cross-Validation Classification Error” on page 32-2735, and transpose the predictor data. load nlpdata Ystats = Y == 'stats'; X = X';

Cross-validate a binary, linear classification model using 5-fold cross-validation. Optimize the objective function using SpaRSA. Specify that the predictor observations correspond to columns. rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'Solver','sparsa','KFold',5,... 'ObservationsIn','columns'); CMdl = CVMdl.Trained{1};

CVMdl is a ClassificationPartitionedLinear model. It contains the property Trained, which is a 5-by-1 cell array holding a ClassificationLinear models that the software trained using the training set of each fold. Create an anonymous function that measures linear loss, that is, L=

∑ j − w jy j f j . ∑ jwj

w j is the weight for observation j, y_j is response j (-1 for the negative class, and 1 otherwise), and f_j is the raw classification score of observation j. Custom loss functions must be written in a particular form. For rules on writing a custom loss function, see the LossFun name-value pair argument. Because the function does not use classification cost, use ~ to have kfoldLoss ignore its position. linearloss = @(C,S,W,~)sum(-W.*sum(S.*C,2))/sum(W);

Estimate the average cross-validated classification loss using the linear loss function. Also, obtain the loss for each fold. ce = kfoldLoss(CVMdl,'LossFun',linearloss) ce = -8.0982

32-2736

kfoldLoss

ceFold = kfoldLoss(CVMdl,'LossFun',linearloss,'Mode','individual') ceFold = 5×1 -8.3165 -8.7633 -7.4342 -8.0423 -7.9347

Find Good Lasso Penalty Using k-fold Classification Loss To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare test-sample classification error rates. Load the NLP data set. Preprocess the data as in “Specify Custom Classification Loss” on page 322736. load nlpdata Ystats = Y == 'stats'; X = X'; −6

Create a set of 11 logarithmically-spaced regularization strengths from 10

0.5

through 10

.

Lambda = logspace(-6,-0.5,11);

Cross-validate binary, linear classification models using 5-fold cross-validation, and that use each of the regularization strengths. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8. rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'KFold',5,'Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8) CVMdl = classreg.learning.partition.ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1×1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods

Extract a trained linear classification model. Mdl1 = CVMdl.Trained{1} Mdl1 = ClassificationLinear

32-2737

32

Functions

ResponseName: ClassNames: ScoreTransform: Beta: Bias: Lambda: Learner:

'Y' [0 1] 'logit' [34023×11 double] [-13.2904 -13.2904 -13.2904 -13.2904 -9.9357 -7.0782 -5.4335 -4.5473 -3.4223 [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 1.5849e-04 5.6234e-04 0.0020 0.0 'logistic'

Properties, Methods

Mdl1 is a ClassificationLinear model object. Because Lambda is a sequence of regularization strengths, you can think of Mdl as 11 models, one for each regularization strength in Lambda. Estimate the cross-validated classification error. ce = kfoldLoss(CVMdl);

Because there are 11 regularization strengths, ce is a 1-by-11 vector of classification error rates. Higher values of Lambda lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model. Mdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the cross-validated, classification error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale. figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(ce),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} classification error') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') title('Test-Sample Statistics') hold off

32-2738

kfoldLoss

Choose the indexes of the regularization strength that balances predictor variable sparsity and low −4

classification error. In this case, a value between 10

−1

to 10

should suffice.

idxFinal = 7;

Select the model from Mdl with the chosen regularization strength. MdlFinal = selectModels(Mdl,idxFinal);

MdlFinal is a ClassificationLinear model containing one regularization strength. To estimate labels for new observations, pass MdlFinal and the new data to predict.

More About Classification Loss Classification loss functions measure the predictive inaccuracy of classification models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model. Consider the following scenario. • L is the weighted average classification loss. • n is the sample size. • For binary classification: 32-2739

32

Functions

• yj is the observed class label. The software codes it as –1 or 1, indicating the negative or positive class, respectively. • f(Xj) is the raw classification score for observation (row) j of the predictor data X. • mj = yjf(Xj) is the classification score for classifying observation j into the class corresponding to yj. Positive values of mj indicate correct classification and do not contribute much to the average loss. Negative values of mj indicate incorrect classification and contribute significantly to the average loss. • For algorithms that support multiclass classification (that is, K ≥ 3): • yj* is a vector of K – 1 zeros, with 1 in the position corresponding to the true, observed class yj. For example, if the true class of the second observation is the third class and K = 4, then y*2 = [0 0 1 0]′. The order of the classes corresponds to the order in the ClassNames property of the input model. • f(Xj) is the length K vector of class scores for observation j of the predictor data X. The order of the scores corresponds to the order of the classes in the ClassNames property of the input model. • mj = yj*′f(Xj). Therefore, mj is the scalar classification score that the model predicts for the true, observed class. • The weight for observation j is wj. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore, n



j=1

w j = 1.

Given this scenario, the following table describes the supported loss functions that you can specify by using the 'LossFun' name-value pair argument. Loss Function

Value of LossFun

Binomial deviance

'binodeviance'

Equation L=

n



j=1

Exponential loss

'exponential'

L=

n



j=1

Classification error

'classiferror'

L=

n



j=1

w jlog 1 + exp −2m j

.

w jexp −m j . w jI y j ≠ y j .

It is the weighted fraction of misclassified observations where y j is the class label corresponding to the class with the maximal posterior probability. I{x} is the indicator function. Hinge loss

'hinge'

n

L=



j=1

32-2740

w jmax 0, 1 − m j .

kfoldLoss

Loss Function

Value of LossFun

Logit loss

'logit'

Equation L=

n



j=1

Minimal cost

'mincost'

w jlog 1 + exp −m j .

Minimal cost. The software computes the weighted minimal cost using this procedure for observations j = 1,...,n. 1

Estimate the 1-by-K vector of expected classification costs for observation j: γ j = f X j ′C . f(Xj) is the column vector of class posterior probabilities for binary and multiclass classification. C is the cost matrix that the input model stores in the Cost property.

2

For observation j, predict the class label corresponding to the minimum expected classification cost: yj=

3

min

j = 1, ..., K

γj .

Using C, identify the cost incurred (cj) for making the prediction.

The weighted, average, minimum cost loss is L=

n



j=1

Quadratic loss

'quadratic'

L=

n



j=1

w jc j . 2

wj 1 − mj .

This figure compares the loss functions (except 'mincost') for one observation over m. Some functions are normalized to pass through [0,1].

32-2741

32

Functions

See Also ClassificationLinear | ClassificationPartitionedLinear | kfoldPredict | loss Introduced in R2016a

32-2742

kfoldLoss

kfoldLoss Classification loss for observations not used in training

Syntax L = kfoldLoss(CVMdl) L = kfoldLoss(CVMdl,Name,Value)

Description L = kfoldLoss(CVMdl) returns the cross-validated classification error on page 32-2752 rates estimated by the cross-validated, error-correcting output codes (ECOC) model composed of linear classification models CVMdl. That is, for every fold, kfoldLoss estimates the classification error rate for observations that it holds out when it trains using all other observations. kfoldLoss applies the same data used create CVMdl (see fitcecoc). L contains a classification loss for each regularization strength in the linear classification models that compose CVMdl. L = kfoldLoss(CVMdl,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, specify a decoding scheme, which folds to use for the loss calculation, or verbosity level.

Input Arguments CVMdl — Cross-validated, ECOC model composed of linear classification models ClassificationPartitionedLinearECOC model object Cross-validated, ECOC model composed of linear classification models, specified as a ClassificationPartitionedLinearECOC model object. You can create a ClassificationPartitionedLinearECOC model using fitcecoc and by: 1

Specifying any one of the cross-validation, name-value pair arguments, for example, CrossVal

2

Setting the name-value pair argument Learners to 'linear' or a linear classification model template returned by templateLinear

To obtain estimates, kfoldLoss applies the same data used to cross-validate the ECOC model (X and Y). Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle 32-2743

32

Functions

Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in, loss-function name or function handle. • This table contains names and descriptions of the built-in functions, where yj is a class label for a particular binary learner (in the set {-1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes the binary losses such that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, e.g., customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction should have this form bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. For an example of passing a custom binary loss function, see “Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function” on page 32-4169. By default, if all binary learners are linear classification models using: • SVM, then BinaryLoss is 'hinge' • Logistic regression, then BinaryLoss is 'quadratic' Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' 32-2744

kfoldLoss

Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2824. Example: 'Decoding','lossbased' Folds — Fold indices to use for classification-score prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices to use for classification-score prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must range from 1 through CVMdl.KFold. Example: 'Folds',[1 4 10] Data Types: single | double LossFun — Loss function 'classiferror' (default) | function handle Loss function, specified as the comma-separated pair consisting of 'LossFun' and a function handle or 'classiferror'. You can: • Specify the built-in function 'classiferror', then the loss function is the classification error on page 32-2752. • Specify your own function using function handle notation. For what follows, n is the number of observations in the training data (CVMdl.NumObservations) and K is the number of classes (numel(CVMdl.ClassNames)). Your function needs the signature lossvalue = lossfun(C,S,W,Cost), where: • The output argument lossvalue is a scalar. • You choose the function name (lossfun). • C is an n-by-K logical matrix with rows indicating which class the corresponding observation belongs. The column order corresponds to the class order in CVMdl.ClassNames. Construct C by setting C(p,q) = 1 if observation p is in class q, for each row. Set every element of row p to 0. • S is an n-by-K numeric matrix of negated loss values for classes. Each row corresponds to an observation. The column order corresponds to the class order in CVMdl.ClassNames. S resembles the output argument NegLoss of kfoldPredict. • W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes its elements to sum to 1. • Cost is a K-by-K numeric matrix of misclassification costs. For example, Cost = ones(K) eye(K) specifies a cost of 0 for correct classification, and 1 for misclassification. Specify your function using 'LossFun',@lossfun. Data Types: function_handle | char | string Mode — Loss aggregation level 'average' (default) | 'individual' 32-2745

32

Functions

Loss aggregation level, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. Value

Description

'average'

Returns losses averaged over all folds

'individual'

Returns losses for each fold

Example: 'Mode','individual' Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments L — Cross-validated classification losses numeric scalar | numeric vector | numeric matrix Cross-validated classification losses on page 32-2752, returned as a numeric scalar, vector, or matrix. The interpretation of L depends on LossFun. Let R be the number of regularizations strengths is the cross-validated models (CVMdl.Trained{1}.BinaryLearners{1}.Lambda) and F be the number of folds (stored in CVMdl.KFold). • If Mode is 'average', then L is a 1-by-R vector. L(j) is the average classification loss over all folds of the cross-validated model that uses regularization strength j. • Otherwise, L is a F-by-R matrix. L(i,j) is the classification loss for fold i of the cross-validated model that uses regularization strength j.

Examples

32-2746

kfoldLoss

Estimate k-Fold Cross-Validation Classification Error Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. Cross-validate an ECOC model of linear classification models. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learner','linear','CrossVal','on');

CVMdl is a ClassificationPartitionedLinearECOC model. By default, the software implements 10-fold cross validation. Estimate the average of the out-of-fold classification error rates. ce = kfoldLoss(CVMdl) ce = 0.0958

Alternatively, you can obtain the per-fold classification error rates by specifying the name-value pair 'Mode','individual' in kfoldLoss.

Specify Custom Classification Loss Load the NLP data set. Transpose the predictor data. load nlpdata X = X';

For simplicity, use the label 'others' for all observations in Y that are not 'simulink', 'dsp', or 'comm'. Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others';

Create a linear classification model template that specifies optimizing the objective function using SpaRSA. t = templateLinear('Solver','sparsa');

Cross-validate an ECOC model of linear classification models using 5-fold cross-validation. Optimize the objective function using SpaRSA. Specify that the predictor observations correspond to columns. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'KFold',5,'ObservationsIn','columns'); CMdl1 = CVMdl.Trained{1} CMdl1 = classreg.learning.classif.CompactClassificationECOC ResponseName: 'Y' ClassNames: [comm dsp simulink others] ScoreTransform: 'none' BinaryLearners: {6x1 cell} CodingMatrix: [4x6 double]

32-2747

32

Functions

Properties, Methods

CVMdl is a ClassificationPartitionedLinearECOC model. It contains the property Trained, which is a 5-by-1 cell array holding a CompactClassificationECOC models that the software trained using the training set of each fold. Create a function that takes the minimal loss for each observation, and then averages the minimal losses across all observations. Because the function does not use the class-identifier matrix (C), observation weights (W), and classification cost (Cost), use ~ to have kfoldLoss ignore its their positions. lossfun = @(~,S,~,~)mean(min(-S,[],2));

Estimate the average cross-validated classification loss using the minimal loss per observation function. Also, obtain the loss for each fold. ce = kfoldLoss(CVMdl,'LossFun',lossfun) ce = 0.0243 ceFold = kfoldLoss(CVMdl,'LossFun',lossfun,'Mode','individual') ceFold = 5×1 0.0244 0.0255 0.0248 0.0240 0.0226

Find Good Lasso Penalty Using Cross-Validation To determine a good lasso-penalty strength for an ECOC model composed of linear classification models that use logistic regression learners, implement 5-fold cross-validation. Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. For simplicity, use the label 'others' for all observations in Y that are not 'simulink', 'dsp', or 'comm'. Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others'; −7

Create a set of 11 logarithmically-spaced regularization strengths from 10 Lambda = logspace(-7,-2,11);

32-2748

−2

through 10

.

kfoldLoss

Create a linear classification model template that specifies to use logistic regression learners, use lasso penalties with strengths in Lambda, train using SpaRSA, and lower the tolerance on the gradient of the objective function to 1e-8. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. X = X'; rng(10); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns','KFold',5);

CVMdl is a ClassificationPartitionedLinearECOC model. Dissect CVMdl, and each model within it. numECOCModels = numel(CVMdl.Trained) numECOCModels = 5 ECOCMdl1 = CVMdl.Trained{1} ECOCMdl1 = classreg.learning.classif.CompactClassificationECOC ResponseName: 'Y' ClassNames: [comm dsp simulink others] ScoreTransform: 'none' BinaryLearners: {6×1 cell} CodingMatrix: [4×6 double] Properties, Methods numCLModels = numel(ECOCMdl1.BinaryLearners) numCLModels = 6 CLMdl1 = ECOCMdl1.BinaryLearners{1}

CLMdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [-1 1] ScoreTransform: 'logit' Beta: [34023×11 double] Bias: [-0.3125 -0.3596 -0.3596 -0.3596 -0.3596 -0.3596 -0.1593 -0.0737 -0.1759 -0.3 Lambda: [1.0000e-07 3.1623e-07 1.0000e-06 3.1623e-06 1.0000e-05 3.1623e-05 1.0000e-04 Learner: 'logistic' Properties, Methods

Because fitcecoc implements 5-fold cross-validation, CVMdl contains a 5-by-1 cell array of CompactClassificationECOC models that the software trains on each fold. The BinaryLearners property of each CompactClassificationECOC model contains the ClassificationLinear 32-2749

32

Functions

models. The number of ClassificationLinear models within each compact ECOC model depends on the number of distinct labels and coding design. Because Lambda is a sequence of regularization strengths, you can think of CLMdl1 as 11 models, one for each regularization strength in Lambda. Determine how well the models generalize by plotting the averages of the 5-fold classification error for each regularization strength. Identify the regularization strength that minimizes the generalization error over the grid. ce = kfoldLoss(CVMdl); figure; plot(log10(Lambda),log10(ce)) [~,minCEIdx] = min(ce); minLambda = Lambda(minCEIdx); hold on plot(log10(minLambda),log10(ce(minCEIdx)),'ro'); ylabel('log_{10} 5-fold classification error') xlabel('log_{10} Lambda') legend('MSE','Min classification error') hold off

Train an ECOC model composed of linear classification model using the entire data set, and specify the minimal regularization strength. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',minLambda,'GradientTolerance',1e-8); MdlFinal = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns');

To estimate labels for new observations, pass MdlFinal and the new data to predict. 32-2750

kfoldLoss

More About Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. 32-2751

32

Functions

Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole. Classification Error The classification error is a binary classification error measure that has the form n



L=

j=1 n



w je j

j=1

, wj

where: • wj is the weight for observation j. The software renormalizes the weights to sum to 1. • ej = 1 if the predicted class of observation j differs from its true class, and 0 otherwise. In other words, the classification error is the proportion of observations misclassified by the classifier.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationECOC | ClassificationLinear | ClassificationPartitionedLinearECOC | fitcecoc | kfoldPredict | loss | statset 32-2752

kfoldLoss

Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2016a

32-2753

32

Functions

kfoldLoss Classification loss for observations not used for training

Syntax L = kfoldLoss(obj) L = kfoldLoss(obj,Name,Value)

Description L = kfoldLoss(obj) returns loss obtained by cross-validated classification model obj. For every fold, this method computes classification loss for in-fold observations using a model trained on out-offold observations. L = kfoldLoss(obj,Name,Value) calculates loss with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments obj Object of class ClassificationPartitionedModel. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. folds Indices of folds ranging from 1 to obj.KFold. Use only these folds for predictions. Default: 1:obj.KFold lossfun Loss function, specified as the comma-separated pair consisting of 'LossFun' and a built-in, lossfunction name or function handle. • The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar.

32-2754

Value

Description

'binodeviance'

Binomial deviance

'classiferror'

Classification error

'exponential'

Exponential

kfoldLoss

Value

Description

'hinge'

Hinge

'logit'

Logistic

'mincost'

Minimal expected misclassification cost (for classification scores that are posterior probabilities)

'quadratic'

Quadratic

'mincost' is appropriate for classification scores that are posterior probabilities. All models use posterior probabilities as classification scores by default except SVM models. You can specify to use posterior probabilities as classification scores for SVM models by setting 'FitPosterior',true when you cross-validate the model using fitcsvm. • Specify your own function using function handle notation. Suppose that n be the number of observations in X and K be the number of distinct classes (numel(obj.ClassNames), obj is the input model). Your function must have this signature lossvalue = lossfun(C,S,W,Cost)

where: • The output argument lossvalue is a scalar. • You choose the function name (lossfun). • C is an n-by-K logical matrix with rows indicating which class the corresponding observation belongs. The column order corresponds to the class order in obj.ClassNames. Construct C by setting C(p,q) = 1 if observation p is in class q, for each row. Set all other elements of row p to 0. • S is an n-by-K numeric matrix of classification scores. The column order corresponds to the class order in obj.ClassNames. S is a matrix of classification scores, similar to the output of predict. • W is an n-by-1 numeric vector of observation weights. If you pass W, the software normalizes them to sum to 1. • Cost is a K-by-K numeric matrix of misclassification costs. For example, Cost = ones(K) eye(K) specifies a cost of 0 for correct classification, and 1 for misclassification. Specify your function using 'LossFun',@lossfun. For more details on loss functions, see “Classification Loss” on page 32-2756. Default: 'classiferror' mode A character vector or string scalar for determining the output of kfoldLoss: • 'average' — L is a scalar, the loss averaged over all folds. • 'individual' — L is a vector of length obj.KFold, where each entry is the loss for a fold. Default: 'average' 32-2755

32

Functions

Output Arguments L Loss, by default the fraction of misclassified data. L can be a vector, and can mean different things, depending on the name-value pair settings.

Examples Estimate Cross-Validated Classification Error Load the ionosphere data set. load ionosphere

Grow a classification tree. tree = fitctree(X,Y);

Cross-validate the classification tree using 10-fold cross-validation. cvtree = crossval(tree);

Estimate the cross-validated classification error. L = kfoldLoss(cvtree) L = 0.1111

More About Classification Loss Classification loss functions measure the predictive inaccuracy of classification models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model. Consider the following scenario. • L is the weighted average classification loss. • n is the sample size. • For binary classification: • yj is the observed class label. The software codes it as –1 or 1, indicating the negative or positive class, respectively. • f(Xj) is the raw classification score for observation (row) j of the predictor data X. • mj = yjf(Xj) is the classification score for classifying observation j into the class corresponding to yj. Positive values of mj indicate correct classification and do not contribute much to the average loss. Negative values of mj indicate incorrect classification and contribute significantly to the average loss. • For algorithms that support multiclass classification (that is, K ≥ 3): 32-2756

kfoldLoss

• yj* is a vector of K – 1 zeros, with 1 in the position corresponding to the true, observed class yj. For example, if the true class of the second observation is the third class and K = 4, then y*2 = [0 0 1 0]′. The order of the classes corresponds to the order in the ClassNames property of the input model. • f(Xj) is the length K vector of class scores for observation j of the predictor data X. The order of the scores corresponds to the order of the classes in the ClassNames property of the input model. • mj = yj*′f(Xj). Therefore, mj is the scalar classification score that the model predicts for the true, observed class. • The weight for observation j is wj. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore, n



j=1

w j = 1.

Given this scenario, the following table describes the supported loss functions that you can specify by using the 'LossFun' name-value pair argument. Loss Function

Value of LossFun

Binomial deviance

'binodeviance'

Equation L=

n



j=1

Exponential loss

'exponential'

L=

n



j=1

Classification error

'classiferror'

L=

n



j=1

w jlog 1 + exp −2m j

.

w jexp −m j . w jI y j ≠ y j .

It is the weighted fraction of misclassified observations where y j is the class label corresponding to the class with the maximal posterior probability. I{x} is the indicator function. Hinge loss

'hinge'

n

L=



w jmax 0, 1 − m j .

j=1

Logit loss

'logit'

L=

n



j=1

w jlog 1 + exp −m j .

32-2757

32

Functions

Loss Function

Value of LossFun

Equation

Minimal cost

'mincost'

Minimal cost. The software computes the weighted minimal cost using this procedure for observations j = 1,...,n. 1

Estimate the 1-by-K vector of expected classification costs for observation j: γ j = f X j ′C . f(Xj) is the column vector of class posterior probabilities for binary and multiclass classification. C is the cost matrix that the input model stores in the Cost property.

2

For observation j, predict the class label corresponding to the minimum expected classification cost: yj=

3

min

j = 1, ..., K

γj .

Using C, identify the cost incurred (cj) for making the prediction.

The weighted, average, minimum cost loss is L=

n



j=1

Quadratic loss

'quadratic'

L=

n



j=1

w jc j . 2

wj 1 − mj .

This figure compares the loss functions (except 'mincost') for one observation over m. Some functions are normalized to pass through [0,1].

32-2758

kfoldLoss

See Also ClassificationPartitionedModel | crossval | kfoldEdge | kfoldMargin | kfoldPredict | kfoldfun Topics “Examine Quality of KNN Classifier” on page 18-27

32-2759

32

Functions

kfoldLoss Cross-validation loss of partitioned regression ensemble

Syntax L = kfoldLoss(cvens) L = kfoldLoss(cvens,Name,Value)

Description L = kfoldLoss(cvens) returns the cross-validation loss of cvens. L = kfoldLoss(cvens,Name,Value) returns cross-validation loss with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments cvens Object of class RegressionPartitionedEnsemble. Create obj with fitrensemble along with one of the cross-validation options: 'crossval', 'kfold', 'holdout', 'leaveout', or 'cvpartition'. Alternatively, create obj from a regression ensemble with crossval. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. folds Indices of folds ranging from 1 to cvens.KFold. Use only these folds for predictions. Default: 1:cvens.KFold lossfun Function handle for loss function, or 'mse', meaning mean squared error. If you pass a function handle fun, loss calls it as fun(Y,Yfit,W)

where Y, Yfit, and W are numeric vectors of the same length. • Y is the observed response. • Yfit is the predicted response. • W is the observation weights. The returned value fun(Y,Yfit,W) should be a scalar. 32-2760

kfoldLoss

Default: 'mse' mode Method for computing cross-validation loss. • 'average' — L is a scalar value, the average loss over all folds. • 'individual' — L is a vector with one element per fold. • 'cumulative' — L is a vector with a length of minimum number of observations used for training in each fold. Each element in the vector L is obtained by taking the average of loss across all folds. Default: 'average'

Output Arguments L The loss (mean squared error) between the observations in a fold when compared against predictions made with an ensemble trained on the out-of-fold data. L can be a vector, and can mean different things, depending on the name-value pair settings.

Examples Find Cross-Validation Loss for Regression Ensemble Find the cross-validation loss for a regression ensemble of the carsmall data. Load the carsmall data set and select displacement, horsepower, and vehicle weight as predictors. load carsmall X = [Displacement Horsepower Weight];

Train an ensemble of regression trees. rens = fitrensemble(X,MPG);

Create a cross-validated ensemble from rens and find the k-fold cross-validation loss. rng(10,'twister') % For reproducibility cvrens = crossval(rens); L = kfoldLoss(cvrens) L = 28.7114

See Also RegressionPartitionedEnsemble | kfoldPredict | loss

32-2761

32

Functions

kfoldLoss Regression loss for observations not used in training

Syntax L = kfoldLoss(CVMdl) L = kfoldLoss(CVMdl,Name,Value)

Description Description L = kfoldLoss(CVMdl) returns the cross-validated mean squared error (MSE) obtained by the cross-validated, linear regression model CVMdl. That is, for every fold, kfoldLoss estimates the regression loss for observations that it holds out when it trains using all other observations. L contains a regression loss for each regularization strength in the linear regression models that compose CVMdl. L = kfoldLoss(CVMdl,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, indicate which folds to use for the loss calculation or specify the regression-loss function.

Input Arguments CVMdl — Cross-validated, linear regression model RegressionPartitionedLinear model object Cross-validated, linear regression model, specified as a RegressionPartitionedLinear model object. You can create a RegressionPartitionedLinear model using fitrlinear and specifying any of the one of the cross-validation, name-value pair arguments, for example, CrossVal. To obtain estimates, kfoldLoss applies the same data used to cross-validate the linear regression model (X and Y). Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Folds — Fold indices to use for response prediction 1:CVMdl.KFold (default) | numeric vector of positive integers Fold indices to use for response prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must range from 1 through CVMdl.KFold. Example: 'Folds',[1 4 10] Data Types: single | double 32-2762

kfoldLoss

LossFun — Loss function 'mse' (default) | 'epsiloninsensitive' | function handle Loss function, specified as the comma-separated pair consisting of 'LossFun' and a built-in, lossfunction name or function handle. • The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar. Also, in the table, f x = xβ + b . • β is a vector of p coefficients. • x is an observation from p predictor variables. • b is the scalar bias. Value

Description

'epsiloninsensitive'

Epsilon-insensitive loss: ℓ y, f x = max 0, y − f x − ε

'mse'

MSE: ℓ y, f x = y − f x

2

'epsiloninsensitive' is appropriate for SVM learners only. • Specify your own function using function handle notation. Let n be the number of observations in X. Your function must have this signature lossvalue = lossfun(Y,Yhat,W)

where: • The output argument lossvalue is a scalar. • You choose the function name (lossfun). • Y is an n-dimensional vector of observed responses. kfoldLoss passes the input argument Y in for Y. • Yhat is an n-dimensional vector of predicted responses, which is similar to the output of predict. • W is an n-by-1 numeric vector of observation weights. Specify your function using 'LossFun',@lossfun. Data Types: char | string | function_handle Mode — Loss aggregation level 'average' (default) | 'individual' Loss aggregation level, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'. Value

Description

'average'

Returns losses averaged over all folds

'individual'

Returns losses for each fold

Example: 'Mode','individual' 32-2763

32

Functions

Output Arguments L — Cross-validated regression losses numeric scalar | numeric vector | numeric matrix Cross-validated regression losses, returned as a numeric scalar, vector, or matrix. The interpretation of L depends on LossFun. Let R be the number of regularizations strengths is the cross-validated models (stored in numel(CVMdl.Trained{1}.Lambda)) and F be the number of folds (stored in CVMdl.KFold). • If Mode is 'average', then L is a 1-by-R vector. L(j) is the average regression loss over all folds of the cross-validated model that uses regularization strength j. • Otherwise, L is an F-by-R matrix. L(i,j) is the regression loss for fold i of the cross-validated model that uses regularization strength j. To estimate L, kfoldLoss uses the data that created CVMdl (see X and Y).

Examples Estimate k-Fold Mean Squared Error Simulate 10000 observations from this model y = x100 + 2x200 + e . •

X = {x1, . . . , x1000} is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

• e is random normal error with mean 0 and standard deviation 0.3. rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Cross-validate a linear regression model using SVM learners. rng(1); % For reproducibility CVMdl = fitrlinear(X,Y,'CrossVal','on');

CVMdl is a RegressionPartitionedLinear model. By default, the software implements 10-fold cross validation. You can alter the number of folds using the 'KFold' name-value pair argument. Estimate the average of the test-sample MSEs. mse = kfoldLoss(CVMdl) mse = 0.1735

Alternatively, you can obtain the per-fold MSEs by specifying the name-value pair 'Mode','individual' in kfoldLoss.

32-2764

kfoldLoss

Specify Custom Regression Loss Simulate data as in “Estimate k-Fold Mean Squared Error” on page 32-2764. rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1); X = X'; % Put observations in columns for faster training

Cross-validate a linear regression model using 10-fold cross-validation. Optimize the objective function using SpaRSA. CVMdl = fitrlinear(X,Y,'CrossVal','on','ObservationsIn','columns',... 'Solver','sparsa');

CVMdl is a RegressionPartitionedLinear model. It contains the property Trained, which is a 10-by-1 cell array holding RegressionLinear models that the software trained using the training set. Create an anonymous function that measures Huber loss (δ = 1), that is, n

L=

1 wj ℓ j , ∑wj j ∑ =1

where 2

ℓj =

0 . 5e j ;

ej ≤ 1

e j − 0 . 5;

ej > 1

.

e j is the residual for observation j. Custom loss functions must be written in a particular form. For rules on writing a custom loss function, see the 'LossFun' name-value pair argument. huberloss = @(Y,Yhat,W)sum(W.*((0.5*(abs(Y-Yhat)1).*abs(Y-Yhat)-0.5)))/sum(W);

Estimate the average Huber loss over the folds. Also, obtain the Huber loss for each fold. mseAve = kfoldLoss(CVMdl,'LossFun',huberloss) mseAve = -0.4447 mseFold = kfoldLoss(CVMdl,'LossFun',huberloss,'Mode','individual') mseFold = 10×1 -0.4454 -0.4473 -0.4452 -0.4469 -0.4434

32-2765

32

Functions

-0.4427 -0.4471 -0.4430 -0.4438 -0.4426

Find Good Lasso Penalty Using Cross-Validation To determine a good lasso-penalty strength for a linear regression model that uses least squares, implement 5-fold cross-validation. Simulate 10000 observations from this model y = x100 + 2x200 + e . •

X = {x1, . . . , x1000} is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

• e is random normal error with mean 0 and standard deviation 0.3. rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1); −5

Create a set of 15 logarithmically-spaced regularization strengths from 10

−1

through 10

.

Lambda = logspace(-5,-1,15);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Optimize the objective function using SpaRSA. X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso'); numCLModels = numel(CVMdl.Trained) numCLModels = 5

CVMdl is a RegressionPartitionedLinear model. Because fitrlinear implements 5-fold crossvalidation, CVMdl contains 5 RegressionLinear models that the software trains on each fold. Display the first trained linear regression model. Mdl1 = CVMdl.Trained{1} Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x15 double]

32-2766

kfoldLoss

Bias: [1x15 double] Lambda: [1x15 double] Learner: 'leastsquares' Properties, Methods

Mdl1 is a RegressionLinear model object. fitrlinear constructed Mdl1 by training on the first four folds. Because Lambda is a sequence of regularization strengths, you can think of Mdl1 as 15 models, one for each regularization strength in Lambda. Estimate the cross-validated MSE. mse = kfoldLoss(CVMdl);

Higher values of Lambda lead to predictor variable sparsity, which is a good quality of a regression model. For each regularization strength, train a linear regression model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model. Mdl = fitrlinear(X,Y,'ObservationsIn','columns','Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso'); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the cross-validated MSE and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale. figure [h,hL1,hL2] = plotyy(log10(Lambda),log10(mse),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} MSE') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') hold off

32-2767

32

Functions

Choose the index of the regularization strength that balances predictor variable sparsity and low MSE (for example, Lambda(10)). idxFinal = 10;

Extract the model with corresponding to the minimal MSE. MdlFinal = selectModels(Mdl,idxFinal) MdlFinal = RegressionLinear ResponseName: ResponseTransform: Beta: Bias: Lambda: Learner:

'Y' 'none' [1000x1 double] -0.0050 0.0037 'leastsquares'

Properties, Methods idxNZCoeff = find(MdlFinal.Beta~=0) idxNZCoeff = 2×1 100 200

32-2768

kfoldLoss

EstCoeff = Mdl.Beta(idxNZCoeff) EstCoeff = 2×1 1.0051 1.9965

MdlFinal is a RegressionLinear model with one regularization strength. The nonzero coefficients EstCoeff are close to the coefficients that simulated the data.

See Also RegressionLinear | RegressionPartitionedLinear | kfoldPredict | loss Introduced in R2016a

32-2769

32

Functions

kfoldLoss Cross-validation loss of partitioned regression model

Syntax L = kfoldLoss(cvmodel) L = kfoldLoss(cvmodel,Name,Value)

Description L = kfoldLoss(cvmodel) returns the cross-validation loss of cvmodel. L = kfoldLoss(cvmodel,Name,Value) returns cross-validation loss with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments cvmodel Object of class RegressionPartitionedModel. Create obj with fitrtree along with one of the cross-validation options: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Alternatively, create obj from a regression tree with crossval. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. folds Indices of folds ranging from 1 to obj.KFold. Use only these folds for predictions. Default: 1:obj.KFold lossfun Function handle for loss function or 'mse', meaning mean squared error. If you pass a function handle fun, kfoldLoss calls it as fun(Y,Yfit,W)

where Y, Yfit, and W are numeric vectors of the same length. • Y is the observed response. • Yfit is the predicted response. • W is the observation weights. The returned value fun(Y,Yfit,W) should be a scalar. 32-2770

kfoldLoss

Default: 'mse' mode One of the following: • 'average' — L is the average loss over all folds. • 'individual' — L is a vector of the individual losses of in-fold observations trained on out-offold data. Default: 'average'

Output Arguments L The loss (mean squared error) between the observations in a fold when compared against predictions made with a tree trained on the out-of-fold data. If mode is 'individual', L is a vector of the losses. If mode is 'average', L is the average loss.

Examples Construct a partitioned regression model, and examine the cross-validation losses for the folds: load carsmall XX = [Cylinders Displacement Horsepower Weight]; YY = MPG; cvmodel = fitrtree(XX,YY,'crossval','on'); L = kfoldLoss(cvmodel,'mode','individual') L = 44.9635 11.8525 18.2046 9.2965 29.4329 54.8659 24.6446 8.2085 19.7593 16.7394

Alternatives You can avoid constructing a cross-validated tree model by calling cvloss instead of kfoldLoss. The cross-validated tree can save time if you are going to examine it more than once.

See Also fitrtree | kfoldPredict | loss

32-2771

32

Functions

kfoldMargin Package: classreg.learning.partition Classification margins for cross-validated ECOC model

Syntax margin = kfoldMargin(CVMdl) margin = kfoldMargin(CVMdl,Name,Value)

Description margin = kfoldMargin(CVMdl) returns classification margins on page 32-2777 obtained by the cross-validated ECOC model (ClassificationPartitionedECOC) CVMdl. For every fold, kfoldMargin computes classification margins for validation-fold observations using an ECOC model trained on training-fold observations. CVMdl.X contains both sets of observations. margin = kfoldMargin(CVMdl,Name,Value) returns classification margins with additional options specified by one or more name-value pair arguments. For example, specify the binary learner loss function, decoding scheme, or verbosity level.

Examples Estimate k-Fold Cross-Validation Margins Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); rng(1); % For reproducibility

Train and cross-validate an ECOC model using support vector machine (SVM) binary classifiers. Standardize the predictor data using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(X,Y,'CrossVal','on','Learners',t,'ClassNames',classOrder);

CVMdl is a ClassificationPartitionedECOC model. By default, the software implements 10-fold cross-validation. You can specify a different number of folds using the 'KFold' name-value pair argument. Estimate the margins for validation-fold observations. Display the distribution of the margins using a boxplot. margin = kfoldMargin(CVMdl); boxplot(margin) title('Distribution of Margins')

32-2772

kfoldMargin

Select ECOC Model Features by Comparing Cross-Validation Margins One way to perform feature selection is to compare cross-validation margins from multiple models. Based solely on this criterion, the classifier with the greatest margins is the best classifier. Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); % Class order rng(1); % For reproducibility

Define the following two data sets. • fullX contains all the predictors. • partX contains the petal dimensions. fullX = X; partX = X(:,3:4);

For each predictor set, train and cross-validate an ECOC model using SVM binary classifiers. Standardize the predictors using an SVM template, and specify the class order. 32-2773

32

Functions

t = templateSVM('Standardize',1); CVMdl = fitcecoc(fullX,Y,'CrossVal','on','Learners',t,... 'ClassNames',classOrder); PCVMdl = fitcecoc(partX,Y,'CrossVal','on','Learners',t,... 'ClassNames',classOrder);

CVMdl and PCVMdl are ClassificationPartitionedECOC models. By default, the software implements 10-fold cross-validation. Estimate the margins for each classifier. Use loss-based decoding for aggregating the binary learner results. For each model, display the distribution of the margins using a boxplot. fullMargins = kfoldMargin(CVMdl,'Decoding','lossbased'); partMargins = kfoldMargin(PCVMdl,'Decoding','lossbased'); boxplot([fullMargins partMargins],'Labels',{'All Predictors','Two Predictors'}) title('Distributions of Margins')

The margin distributions are approximately the same.

Input Arguments CVMdl — Cross-validated ECOC model ClassificationPartitionedECOC model

32-2774

kfoldMargin

Cross-validated ECOC model, specified as a ClassificationPartitionedECOC model. You can create a ClassificationPartitionedECOC model in two ways: • Pass a trained ECOC model (ClassificationECOC) to crossval. • Train an ECOC model using fitcecoc and specify any one of these cross-validation name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldMargin(CVMdl,'Verbose',1) specifies to display diagnostic messages in the Command Window. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in loss function name or function handle. • This table describes the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses so that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, for example customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction has this form: bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. 32-2775

32

Functions

• bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. For an example of passing a custom binary loss function, see “Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function” on page 32-4169. The default BinaryLoss value depends on the score ranges returned by the binary learners. This table describes some default BinaryLoss values based on the given assumptions. Assumption

Default Value

All binary learners are SVMs or either linear or kernel classification models of SVM learners.

'hinge'

All binary learners are ensembles trained by AdaboostM1 or GentleBoost.

'exponential'

All binary learners are ensembles trained by LogitBoost.

'binodeviance'

All binary learners are linear or kernel classification models of logistic regression learners. Or, you specify to predict class posterior probabilities by setting 'FitPosterior',true in fitcecoc.

'quadratic'

To check the default value, use dot notation to display the BinaryLoss property of the trained model at the command line. Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2824. Example: 'Decoding','lossbased' Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 32-2776

kfoldMargin

Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments margin — Classification margins numeric vector Classification margins on page 32-2777, returned as a numeric vector. margin is an n-by-1 vector, where each row is the margin of the corresponding observation and n is the number of observations (size(CVMdl.X,1)).

More About Classification Margin The classification margin is, for each observation, the difference between the negative loss for the true class and the maximal negative loss among the false classes. If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is,

32-2777

32

Functions

L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. 32-2778

kfoldMargin

Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationECOC | ClassificationPartitionedECOC | fitcecoc | kfoldEdge | kfoldPredict | margin | statset Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2014b

32-2779

32

Functions

kfoldMargin Package: classreg.learning.partition Classification margins for cross-validated kernel classification model

Syntax margin = kfoldMargin(CVMdl)

Description margin = kfoldMargin(CVMdl) returns the classification margins on page 32-2783 obtained by the cross-validated, binary kernel model (ClassificationPartitionedKernel) CVMdl. For every fold, kfoldMargin computes the classification margins for validation-fold observations using a model trained on training-fold observations.

Examples Estimate k-Fold Cross-Validation Margins Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled as either bad ('b') or good ('g'). load ionosphere

Cross-validate a binary kernel classification model using the data. CVMdl = fitckernel(X,Y,'Crossval','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernel CrossValidatedModel: 'Kernel' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedKernel model. By default, the software implements 10fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Estimate the classification margins for validation-fold observations. m = kfoldMargin(CVMdl); size(m)

32-2780

kfoldMargin

ans = 1×2 351

1

m is a 351-by-1 vector. m(j) is the classification margin for observation j. Plot the k-fold margins using a boxplot. boxplot(m,'Labels','All Observations') title('Distribution of Margins')

Feature Selection Using k-Fold Margins Perform feature selection by comparing k-fold margins from multiple models. Based solely on this criterion, the classifier with the greatest margins is the best classifier. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g'). load ionosphere

Randomly choose 10% of the predictor variables. 32-2781

32

Functions

rng(1); % For reproducibility p = size(X,2); % Number of predictors idxPart = randsample(p,ceil(0.1*p));

Cross-validate two binary kernel classification models: one that uses all of the predictors, and one that uses 10% of the predictors. CVMdl = fitckernel(X,Y,'CrossVal','on'); PCVMdl = fitckernel(X(:,idxPart),Y,'CrossVal','on');

CVMdl and PCVMdl are ClassificationPartitionedKernel models. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' namevalue pair argument instead of 'Crossval'. Estimate the k-fold margins for each classifier. fullMargins = kfoldMargin(CVMdl); partMargins = kfoldMargin(PCVMdl);

Plot the distribution of the margin sets using box plots. boxplot([fullMargins partMargins], ... 'Labels',{'All Predictors','10% of the Predictors'}); title('Distribution of Margins')

The quartiles of the PCVMdl margin distribution are situated higher than the quartiles of the CVMdl margin distribution, indicating that the PCVMdl model is the better classifier.

32-2782

kfoldMargin

Input Arguments CVMdl — Cross-validated, binary kernel classification model ClassificationPartitionedKernel model object Cross-validated, binary kernel classification model, specified as a ClassificationPartitionedKernel model object. You can create a ClassificationPartitionedKernel model by using fitckernel and specifying any one of the cross-validation name-value pair arguments. To obtain estimates, kfoldMargin applies the same data used to cross-validate the kernel classification model (X and Y).

Output Arguments margin — Classification margins numeric vector Classification margins on page 32-2783, returned as a numeric vector. margin is an n-by-1 vector, where each row is the margin of the corresponding observation and n is the number of observations (size(CVMdl.Y,1)).

More About Classification Margin The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class. The software defines the classification margin for binary classification as m = 2yf x . x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x). If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Classification Score For kernel classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by f x = T(x)β + b . • T · is a transformation of an observation for feature expansion. • β is the estimated column vector of coefficients. • b is the estimated scalar bias. The raw classification score for classifying x into the negative class is −f(x). The software classifies observations into the class that yields a positive score. 32-2783

32

Functions

If the kernel classification model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores (see ScoreTransform).

See Also ClassificationPartitionedKernel | fitckernel Introduced in R2018b

32-2784

kfoldMargin

kfoldMargin Package: classreg.learning.partition Classification margins for cross-validated kernel ECOC model

Syntax margin = kfoldMargin(CVMdl) margin = kfoldMargin(CVMdl,Name,Value)

Description margin = kfoldMargin(CVMdl) returns the classification margins on page 32-2790 obtained by the cross-validated kernel ECOC model (ClassificationPartitionedKernelECOC) CVMdl. For every fold, kfoldMargin computes the classification margins for validation-fold observations using a model trained on training-fold observations. margin = kfoldMargin(CVMdl,Name,Value) returns classification margins with additional options specified by one or more name-value pair arguments. For example, specify the binary learner loss function, decoding scheme, or verbosity level.

Examples Estimate k-Fold Cross-Validation Margins Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Cross-validate an ECOC model composed of kernel binary learners. CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' Properties, Methods

32-2785

32

Functions

CVMdl is a ClassificationPartitionedKernelECOC model. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Estimate the classification margins for validation-fold observations. m = kfoldMargin(CVMdl); size(m) ans = 1×2 150

1

m is a 150-by-1 vector. m(j) is the classification margin for observation j. Plot the k-fold margins using a boxplot. boxplot(m,'Labels','All Observations') title('Distribution of Margins')

Feature Selection Using k-Fold Margins Perform feature selection by comparing k-fold margins from multiple models. Based solely on this criterion, the classifier with the greatest margins is the best classifier. 32-2786

kfoldMargin

Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Randomly choose half of the predictor variables. rng(1); % For reproducibility p = size(X,2); % Number of predictors idxPart = randsample(p,ceil(0.5*p));

Cross-validate two ECOC models composed of kernel classification models: one that uses all of the predictors, and one that uses half of the predictors. CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on'); PCVMdl = fitcecoc(X(:,idxPart),Y,'Learners','kernel','CrossVal','on');

CVMdl and PCVMdl are ClassificationPartitionedKernelECOC models. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Estimate the k-fold margins for each classifier. fullMargins = kfoldMargin(CVMdl); partMargins = kfoldMargin(PCVMdl);

Plot the distribution of the margin sets using box plots. boxplot([fullMargins partMargins], ... 'Labels',{'All Predictors','Half of the Predictors'}); title('Distribution of Margins')

32-2787

32

Functions

The PCVMdl margin distribution is similar to the CVMdl margin distribution.

Input Arguments CVMdl — Cross-validated kernel ECOC model ClassificationPartitionedKernelECOC model Cross-validated kernel ECOC model, specified as a ClassificationPartitionedKernelECOC model. You can create a ClassificationPartitionedKernelECOC model by training an ECOC model using fitcecoc and specifying these name-value pair arguments: • 'Learners'– Set the value to 'kernel', a template object returned by templateKernel, or a cell array of such template objects. • One of the arguments 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldMargin(CVMdl,'Verbose',1) specifies to display diagnostic messages in the Command Window.

32-2788

kfoldMargin

BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in loss function name or function handle. • This table contains names and descriptions of the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, for example, customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction has this form: bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. By default, if all binary learners are kernel classification models using SVM, then BinaryLoss is 'hinge'. If all binary learners are kernel classification models using logistic regression, then BinaryLoss is 'quadratic'. Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' 32-2789

32

Functions

Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2790. Example: 'Decoding','lossbased' Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments margin — Classification margins numeric vector Classification margins on page 32-2790, returned as a numeric vector. margin is an n-by-1 vector, where each row is the margin of the corresponding observation and n is the number of observations (size(CVMdl.Y,1)).

More About Classification Margin The classification margin is, for each observation, the difference between the negative loss for the true class and the maximal negative loss among the false classes. If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: 32-2790

kfoldMargin

• mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. 32-2791

32

Functions

[2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

See Also ClassificationPartitionedKernelECOC | fitcecoc Introduced in R2018b

32-2792

kfoldMargin

kfoldMargin Classification margins for observations not used in training

Syntax m = kfoldMargin(CVMdl)

Description m = kfoldMargin(CVMdl) returns the cross-validated classification margins on page 32-2799 obtained by the cross-validated, binary, linear classification model CVMdl. That is, for every fold, kfoldMargin estimates the classification margins for observations that it holds out when it trains using all other observations. m contains classification margins for each regularization strength in the linear classification models that comprise CVMdl.

Input Arguments CVMdl — Cross-validated, binary, linear classification model ClassificationPartitionedLinear model object Cross-validated, binary, linear classification model, specified as a ClassificationPartitionedLinear model object. You can create a ClassificationPartitionedLinear model using fitclinear and specifying any one of the cross-validation, name-value pair arguments, for example, CrossVal. To obtain estimates, kfoldMargin applies the same data used to cross-validate the linear classification model (X and Y).

Output Arguments m — Cross-validated classification margins numeric vector | numeric matrix Cross-validated classification margins on page 32-2799, returned as a numeric vector or matrix. m is n-by-L, where n is the number of observations in the data that created CVMdl (see X and Y) and L is the number of regularization strengths in CVMdl (that is, numel(CVMdl.Trained{1}.Lambda)). m(i,j) is the cross-validated classification margin of observation i using the linear classification model that has regularization strength CVMdl.Trained{1}.Lambda(j). Data Types: single | double

Examples

32-2793

32

Functions

Estimate k-Fold Cross-Validation Margins Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Cross-validate a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'CrossVal','on');

CVMdl is a ClassificationPartitionedLinear model. By default, the software implements 10fold cross validation. You can alter the number of folds using the 'KFold' name-value pair argument. Estimate the cross-validated margins. m = kfoldMargin(CVMdl); size(m) ans = 1×2 31572

1

m is a 31572-by-1 vector. m(j) is the average of the out-of-fold margins for observation j. Plot the k-fold margins using box plots. figure; boxplot(m); h = gca; h.YLim = [-5 30]; title('Distribution of Cross-Validated Margins')

32-2794

kfoldMargin

Feature Selection Using k-fold Margins One way to perform feature selection is to compare k-fold margins from multiple models. Based solely on this criterion, the classifier with the larger margins is the better classifier. Load the NLP data set. Preprocess the data as in “Estimate k-Fold Cross-Validation Margins” on page 32-2793. load nlpdata Ystats = Y == 'stats'; X = X';

Create these two data sets: • fullX contains all predictors. • partX contains 1/2 of the predictors chosen at random. rng(1); % For reproducibility p = size(X,1); % Number of predictors halfPredIdx = randsample(p,ceil(0.5*p)); fullX = X; partX = X(halfPredIdx,:);

32-2795

32

Functions

Cross-validate two binary, linear classification models: one that uses the all of the predictors and one that uses half of the predictors. Optimize the objective function using SpaRSA, and indicate that observations correspond to columns. CVMdl = fitclinear(fullX,Ystats,'CrossVal','on','Solver','sparsa',... 'ObservationsIn','columns'); PCVMdl = fitclinear(partX,Ystats,'CrossVal','on','Solver','sparsa',... 'ObservationsIn','columns');

CVMdl and PCVMdl are ClassificationPartitionedLinear models. Estimate the k-fold margins for each classifier. Plot the distribution of the k-fold margins sets using box plots. fullMargins = kfoldMargin(CVMdl); partMargins = kfoldMargin(PCVMdl); figure; boxplot([fullMargins partMargins],'Labels',... {'All Predictors','Half of the Predictors'}); h = gca; h.YLim = [-30 60]; title('Distribution of Cross-Validated Margins')

The distributions of the margins of the two classifiers are similar.

32-2796

kfoldMargin

Find Good Lasso Penalty Using k-fold Margins To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare distributions of k-fold margins. Load the NLP data set. Preprocess the data as in “Estimate k-Fold Cross-Validation Margins” on page 32-2793. load nlpdata Ystats = Y == 'stats'; X = X'; −8

Create a set of 11 logarithmically-spaced regularization strengths from 10

1

through 10 .

Lambda = logspace(-8,1,11);

Cross-validate a binary, linear classification model using 5-fold cross-validation and that uses each of the regularization strengths. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8. rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns','KFold',5, ... 'Learner','logistic','Solver','sparsa','Regularization','lasso', ... 'Lambda',Lambda,'GradientTolerance',1e-8) CVMdl = classreg.learning.partition.ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1×1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedLinear model. Because fitclinear implements 5-fold cross-validation, CVMdl contains 5 ClassificationLinear models that the software trains on each fold. Estimate the k-fold margins for each regularization strength. m = kfoldMargin(CVMdl); size(m) ans = 1×2 31572

11

m is a 31572-by-11 matrix of cross-validated margins for each observation. The columns correspond to the regularization strengths.

32-2797

32

Functions

Plot the k-fold margins for each regularization strength. Because logistic regression scores are in [0,1], margins are in [-1,1]. Rescale the margins to help identify the regularization strength that maximizes the margins over the grid. figure boxplot(10000.^m) ylabel('Exponentiated test-sample margins') xlabel('Lambda indices')

Several values of Lambda yield k-fold margin distributions that are compacted near 10000. Higher values of lambda lead to predictor variable sparsity, which is a good quality of a classifier. Choose the regularization strength that occurs just before the centers of the k-fold margin distributions start decreasing. LambdaFinal = Lambda(5);

Train a linear classification model using the entire data set and specify the desired regularization strength. MdlFinal = fitclinear(X,Ystats,'ObservationsIn','columns', ... 'Learner','logistic','Solver','sparsa','Regularization','lasso', ... 'Lambda',LambdaFinal);

To estimate labels for new observations, pass MdlFinal and the new data to predict.

32-2798

kfoldMargin

More About Classification Margin The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class. The software defines the classification margin for binary classification as m = 2yf x . x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x). If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Classification Score For linear classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by f j(x) = xβ j + b j . For the model with regularization strength j, β j is the estimated column vector of coefficients (the model property Beta(:,j)) and b j is the estimated, scalar bias (the model property Bias(j)). The raw classification score for classifying x into the negative class is –f(x). The software classifies observations into the class that yields the positive score. If the linear classification model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores (see ScoreTransform).

See Also ClassificationLinear | ClassificationPartitionedLinear | kfoldEdge | kfoldPredict | margin Introduced in R2016a

32-2799

32

Functions

kfoldMargin Classification margins for observations not used in training

Syntax m = kfoldMargin(CVMdl) m = kfoldMargin(CVMdl,Name,Value)

Description m = kfoldMargin(CVMdl) returns the cross-validated classification margins on page 32-2809 obtained by CVMdl, which is a cross-validated, error-correcting output codes (ECOC) model composed of linear classification models. That is, for every fold, kfoldMargin estimates the classification margins for observations that it holds out when it trains using all other observations. m contains classification margins for each regularization strength in the linear classification models that comprise CVMdl. m = kfoldMargin(CVMdl,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, specify a decoding scheme or verbosity level.

Input Arguments CVMdl — Cross-validated, ECOC model composed of linear classification models ClassificationPartitionedLinearECOC model object Cross-validated, ECOC model composed of linear classification models, specified as a ClassificationPartitionedLinearECOC model object. You can create a ClassificationPartitionedLinearECOC model using fitcecoc and by: 1

Specifying any one of the cross-validation, name-value pair arguments, for example, CrossVal

2

Setting the name-value pair argument Learners to 'linear' or a linear classification model template returned by templateLinear

To obtain estimates, kfoldMargin applies the same data used to cross-validate the ECOC model (X and Y). Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in, loss-function name or function handle. 32-2800

kfoldMargin

• This table contains names and descriptions of the built-in functions, where yj is a class label for a particular binary learner (in the set {-1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes the binary losses such that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, e.g., customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction should have this form bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. For an example of passing a custom binary loss function, see “Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function” on page 32-4169. By default, if all binary learners are linear classification models using: • SVM, then BinaryLoss is 'hinge' • Logistic regression, then BinaryLoss is 'quadratic' Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased'

32-2801

32

Functions

Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2824. Example: 'Decoding','lossbased' Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments m — Cross-validated classification margins numeric vector | numeric matrix Cross-validated classification margins on page 32-2809, returned as a numeric vector or matrix. m is n-by-L, where n is the number of observations in X and L is the number of regularization strengths in Mdl (that is, numel(Mdl.Lambda)). m(i,j) is the cross-validated classification margin of observation i using the ECOC model, composed of linear classification models, that has regularization strength Mdl.Lambda(j).

Examples Estimate k-Fold Cross-Validation Margins Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. For simplicity, use the label 'others' for all observations in Y that are not 'simulink', 'dsp', or 'comm'. 32-2802

kfoldMargin

Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others';

Cross-validate a multiclass, linear classification model. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learner','linear','CrossVal','on');

CVMdl is a ClassificationPartitionedLinearECOC model. By default, the software implements 10-fold cross validation. You can alter the number of folds using the 'KFold' name-value pair argument. Estimate the k-fold margins. m = kfoldMargin(CVMdl); size(m) ans = 1×2 31572

1

m is a 31572-by-1 vector. m(j) is the average of the out-of-fold margins for observation j. Plot the k-fold margins using box plots. figure; boxplot(m); h = gca; h.YLim = [-5 5]; title('Distribution of Cross-Validated Margins')

32-2803

32

Functions

Feature Selection Using k-fold Margins One way to perform feature selection is to compare k-fold margins from multiple models. Based solely on this criterion, the classifier with the larger margins is the better classifier. Load the NLP data set. Preprocess the data as in “Estimate k-Fold Cross-Validation Margins” on page 32-2802, and orient the predictor data so that observations correspond to columns. load nlpdata Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others'; X = X';

Create these two data sets: • fullX contains all predictors. • partX contains 1/2 of the predictors chosen at random. rng(1); % For reproducibility p = size(X,1); % Number of predictors halfPredIdx = randsample(p,ceil(0.5*p)); fullX = X; partX = X(halfPredIdx,:);

Create a linear classification model template that specifies optimizing the objective function using SpaRSA. 32-2804

kfoldMargin

t = templateLinear('Solver','sparsa');

Cross-validate two ECOC models composed of binary, linear classification models: one that uses the all of the predictors and one that uses half of the predictors. Indicate that observations correspond to columns. CVMdl = fitcecoc(fullX,Y,'Learners',t,'CrossVal','on',... 'ObservationsIn','columns'); PCVMdl = fitcecoc(partX,Y,'Learners',t,'CrossVal','on',... 'ObservationsIn','columns');

CVMdl and PCVMdl are ClassificationPartitionedLinearECOC models. Estimate the k-fold margins for each classifier. Plot the distribution of the k-fold margins sets using box plots. fullMargins = kfoldMargin(CVMdl); partMargins = kfoldMargin(PCVMdl); figure; boxplot([fullMargins partMargins],'Labels',... {'All Predictors','Half of the Predictors'}); h = gca; h.YLim = [-1 1]; title('Distribution of Cross-Validated Margins')

The distributions of the k-fold margins of the two classifiers are similar.

32-2805

32

Functions

Find Good Lasso Penalty Using k-fold Margins To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare distributions of k-fold margins. Load the NLP data set. Preprocess the data as in “Feature Selection Using k-fold Margins” on page 32-2804. load nlpdata Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others'; X = X'; −8

Create a set of 11 logarithmically-spaced regularization strengths from 10

1

through 10 .

Lambda = logspace(-8,1,11);

Create a linear classification model template that specifies using logistic regression with a lasso penalty, using each of the regularization strengths, optimizing the objective function using SpaRSA, and reducing the tolerance on the gradient of the objective function to 1e-8. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8);

Cross-validate an ECOC model composed of binary, linear classification models using 5-fold crossvalidation and that rng(10); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns','KFold',5) CVMdl = classreg.learning.partition.ClassificationPartitionedLinearECOC CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1×1 cvpartition] ClassNames: [comm dsp simulink others] ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedLinearECOC model. Estimate the k-fold margins for each regularization strength. The scores for logistic regression are in [0,1]. Apply the quadratic binary loss. m = kfoldMargin(CVMdl,'BinaryLoss','quadratic'); size(m) ans = 1×2 31572

32-2806

11

kfoldMargin

m is a 31572-by-11 matrix of cross-validated margins for each observation. The columns correspond to the regularization strengths. Plot the k-fold margins for each regularization strength. figure; boxplot(m) ylabel('Cross-validated margins') xlabel('Lambda indices')

Several values of Lambda yield similarly high margin distribution centers with low spreads. Higher values of Lambda lead to predictor variable sparsity, which is a good quality of a classifier. Choose the regularization strength that occurs just before the margin distribution center starts decreasing and spread starts increasing. LambdaFinal = Lambda(5);

Train an ECOC model composed of linear classification model using the entire data set and specify the regularization strength LambdaFinal. t = templateLinear('Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda(5),'GradientTolerance',1e-8); MdlFinal = fitcecoc(X,Y,'Learners',t,'ObservationsIn','columns');

To estimate labels for new observations, pass MdlFinal and the new data to predict.

32-2807

32

Functions

More About Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. 32-2808

kfoldMargin

Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole. Classification Margin The classification margin is, for each observation, the difference between the negative loss for the true class and the maximal negative loss among the false classes. If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationLinear | ClassificationPartitionedLinearECOC | kfoldEdge | kfoldPredict | margin Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2016a 32-2809

32

Functions

kfoldMargin Classification margins for observations not used for training

Syntax M = kfoldMargin(obj)

Description M = kfoldMargin(obj) returns classification margins obtained by cross-validated classification model obj. For every fold, this method computes classification margins for in-fold observations using a model trained on out-of-fold observations.

Input Arguments obj A partitioned classification model of type ClassificationPartitionedModel or ClassificationPartitionedEnsemble.

Output Arguments M The classification margin.

Examples Estimate the k-fold Margins of a Classifier Find the k-fold margins for an ensemble that classifies the ionosphere data. Load the ionosphere data set. load ionosphere

Create a template tree stump. t = templateTree('MaxNumSplits',1);

Train a classification ensemble of decision trees. Specify t as the weak learner. Mdl = fitcensemble(X,Y,'Method','AdaBoostM1','Learners',t);

Cross validate the classifier using 10-fold cross validation. cvens = crossval(Mdl);

Compute the k-fold margins. Display summary statistics for the margins. 32-2810

kfoldMargin

m = kfoldMargin(cvens); marginStats = table(min(m),mean(m),max(m),... 'VariableNames',{'Min','Mean','Max'}) marginStats=1×3 table Min Mean _______ ______ -11.312

7.3236

Max ______ 23.517

More About Margin The classification margin is the difference between the classification score for the true class and maximal classification score for the false classes. The classification margin is a column vector with the same number of rows as in the matrix X. A high value of margin indicates a more reliable prediction than a low value. Score For discriminant analysis, the score of a classification is the posterior probability of the classification. For the definition of posterior probability in discriminant analysis, see “Posterior Probability” on page 20-6. For ensembles, a classification score represents the confidence of a classification into a class. The higher the score, the higher the confidence. Different ensemble algorithms have different definitions for their scores. Furthermore, the range of scores depends on ensemble type. For example: • AdaBoostM1 scores range from –∞ to ∞. • Bag scores range from 0 to 1. For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node. For example, consider classifying a predictor X as true when X < 0.15 or X > 0.95, and X is false otherwise. Generate 100 random points and classify them: rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')

32-2811

32

Functions

Prune the tree: tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')

32-2812

kfoldMargin

The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations from .15 to .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false. Compute the prediction scores for the first 10 rows of X: [~,score] = predict(tree1,X(1:10)); [score X(1:10,:)] ans = 10×3 0.9059 0.9059 0 0.9059 0.9059 0 0.9059 0.9059 0.9059

0.0941 0.0941 1.0000 0.0941 0.0941 1.0000 0.0941 0.0941 0.0941

0.8147 0.9058 0.1270 0.9134 0.6324 0.0975 0.2785 0.5469 0.9575

32-2813

32

Functions

0.9059

0.0941

0.9649

Indeed, every value of X (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.91 and 0.09. The difference (score 0.09 instead of the expected .06) is due to a statistical fluctuation: there are 8 observations in X in the range (.95,1) instead of the expected 5 observations.

See Also ClassificationPartitionedModel | crossval | kfoldEdge | kfoldLoss | kfoldPredict | kfoldfun

32-2814

kfoldPredict

kfoldPredict Package: classreg.learning.partition Classify observations in cross-validated ECOC model

Syntax label = kfoldPredict(CVMdl) label = kfoldPredict(CVMdl,Name,Value) [label,NegLoss,PBScore] = kfoldPredict( ___ ) [label,NegLoss,PBScore,Posterior] = kfoldPredict( ___ )

Description label = kfoldPredict(CVMdl) returns class labels predicted by the cross-validated ECOC model (ClassificationPartitionedECOC) CVMdl. For every fold, kfoldPredict predicts class labels for observations that it holds out during training. CVMdl.X contains both sets of observations. The software predicts the classification of an observation by assigning the observation to the class yielding the largest negated average binary loss (or, equivalently, the smallest average binary loss). label = kfoldPredict(CVMdl,Name,Value) returns predicted class labels with additional options specified by one or more name-value pair arguments. For example, specify the posterior probability estimation method, decoding scheme, or verbosity level. [label,NegLoss,PBScore] = kfoldPredict( ___ ) additionally returns negated values of the average binary loss per class (NegLoss) for validation-fold observations and positive-class scores (PBScore) for validation-fold observations classified by each binary learner, using any of the input argument combinations in the previous syntaxes. If the coding matrix varies across folds (that is, the coding scheme is sparserandom or denserandom), then PBScore is empty ([]). [label,NegLoss,PBScore,Posterior] = kfoldPredict( ___ ) additionally returns posterior class probability estimates for validation-fold observations (Posterior). To obtain posterior class probabilities, you must set 'FitPosterior',1 when training the crossvalidated ECOC model using fitcecoc. Otherwise, kfoldPredict throws an error.

Examples Predict k-Fold Cross-Validation Labels Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species);

32-2815

32

Functions

classOrder = unique(Y); rng(1); % For reproducibility

Train and cross-validate an ECOC model using support vector machine (SVM) binary classifiers. Standardize the predictor data using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(X,Y,'CrossVal','on','Learners',t,'ClassNames',classOrder);

CVMdl is a ClassificationPartitionedECOC model. By default, the software implements 10-fold cross-validation. You can specify a different number of folds using the 'KFold' name-value pair argument. Predict the validation-fold labels. Print a random subset of true and predicted labels. labels = kfoldPredict(CVMdl); idx = randsample(numel(labels),10); table(Y(idx),labels(idx),... 'VariableNames',{'TrueLabels','PredictedLabels'}) ans=10×2 table TrueLabels __________ setosa versicolor setosa virginica versicolor setosa virginica virginica setosa setosa

PredictedLabels _______________ setosa versicolor setosa virginica versicolor setosa virginica virginica setosa setosa

CVMdl correctly labels the validation-fold observations with indices idx.

Predict Cross-Validation Labels Using Custom Binary Loss Function Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); % Class order K = numel(classOrder); % Number of classes rng(1); % For reproducibility

Train and cross-validate an ECOC model using SVM binary classifiers. Standardize the predictor data using an SVM template, and specify the class order. t = templateSVM('Standardize',1); CVMdl = fitcecoc(X,Y,'CrossVal','on','Learners',t,'ClassNames',classOrder);

32-2816

kfoldPredict

CVMdl is a ClassificationPartitionedECOC model. By default, the software implements 10-fold cross-validation. You can specify a different number of folds using the 'KFold' name-value pair argument. SVM scores are signed distances from the observation to the decision boundary. Therefore, the domain is ( − ∞ , ∞ ). Create a custom binary loss function that: • Maps the coding design matrix (M) and positive-class classification scores (s) for each learner to the binary loss for each observation • Uses linear loss • Aggregates the binary learner loss using the median You can create a separate function for the binary loss function, and then save it on the MATLAB® path. Alternatively, you can specify an anonymous binary loss function. In this case, create a function handle (customBL) to an anonymous binary loss function. customBL = @(M,s)nanmedian(1 - bsxfun(@times,M,s),2)/2;

Predict cross-validation labels and estimate the median binary loss per class. Print the median negative binary losses per class for a random set of 10 validation-fold observations. [label,NegLoss] = kfoldPredict(CVMdl,'BinaryLoss',customBL); idx = randsample(numel(label),10); classOrder classOrder = 3x1 categorical setosa versicolor virginica table(Y(idx),label(idx),NegLoss(idx,:),'VariableNames',... {'TrueLabel','PredictedLabel','NegLoss'}) ans=10×3 table TrueLabel __________ setosa versicolor setosa virginica versicolor setosa virginica virginica setosa setosa

PredictedLabel ______________ versicolor versicolor versicolor virginica versicolor versicolor versicolor virginica versicolor versicolor

NegLoss _________________________________ 0.37141 -1.2167 0.23927 -1.9154 -1.3746 0.20061 -1.4928 -1.7669 0.19999 0.16108

2.1292 0.3669 2.08 -0.19947 0.45535 2.2774 0.090689 -0.13464 1.9113 1.9684

-4.0006 -0.65017 -3.8193 0.6149 -0.58076 -3.9781 -0.097935 0.4015 -3.6113 -3.6295

The order of the columns corresponds to the elements of classOrder. The software predicts the label based on the maximum negated loss. The results indicate that the median of the linear losses might not perform as well as other losses.

32-2817

32

Functions

Estimate Cross-Validation Posterior Probabilities Load Fisher's iris data set. Use the petal dimensions as the predictor data X. Specify the response data Y and the order of the classes in Y. load fisheriris X = meas(:,3:4); Y = categorical(species); classOrder = unique(Y); rng(1); % For reproducibility

Create an SVM template. Standardize the predictors, and specify the Gaussian kernel. t = templateSVM('Standardize',1,'KernelFunction','gaussian');

t is an SVM template. Most of its properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values. Train and cross-validate an ECOC classifier using the SVM template. Transform classification scores to class posterior probabilities (returned by kfoldPredict) using the 'FitPosterior' name-value pair argument. Specify the class order. CVMdl = fitcecoc(X,Y,'Learners',t,'CrossVal','on','FitPosterior',true,... 'ClassNames',classOrder);

CVMdl is a ClassificationPartitionedECOC model. By default, the software uses 10-fold crossvalidation. Predict the validation-fold class posterior probabilities. Use 10 random initial values for the KullbackLeibler algorithm. [label,~,~,Posterior] = kfoldPredict(CVMdl,'NumKLInitializations',10);

The software assigns an observation to the class that yields the smallest average binary loss. Because all the binary learners compute posterior probabilities, the binary loss function is quadratic. Display a random set of results. idx = randsample(size(X,1),10); CVMdl.ClassNames ans = 3x1 categorical setosa versicolor virginica table(Y(idx),label(idx),Posterior(idx,:),... 'VariableNames',{'TrueLabel','PredLabel','Posterior'})

32-2818

ans=10×3 table TrueLabel __________

PredLabel __________

Posterior ______________________________________

versicolor versicolor setosa versicolor

versicolor virginica setosa versicolor

0.0086394 2.2197e-14 0.999 2.2194e-14

0.98243 0.12447 0.00022837 0.98916

0.0089291 0.87553 0.00076884 0.010839

kfoldPredict

virginica virginica virginica setosa virginica setosa

virginica virginica virginica setosa virginica setosa

0.012318 0.0015571 0.0042895 0.999 0.0094721 0.999

0.012925 0.0015638 0.0043556 0.00028329 0.0098231 0.00013562

0.97476 0.99688 0.99135 0.00071382 0.9807 0.00086192

The columns of Posterior correspond to the class order of CVMdl.ClassNames.

Estimate Cross-Validation Posterior Probabilities Using Parallel Computing Train a multiclass ECOC model and estimate the posterior probabilities using parallel computing. Load the arrhythmia data set. Examine the response data Y. load arrhythmia Y = categorical(Y); tabulate(Y) Value 1 2 3 4 5 6 7 8 9 10 14 15 16

Count 245 44 15 15 13 25 3 2 9 50 4 5 22

Percent 54.20% 9.73% 3.32% 3.32% 2.88% 5.53% 0.66% 0.44% 1.99% 11.06% 0.88% 1.11% 4.87%

n = numel(Y); K = numel(unique(Y));

Several classes are not represented in the data, and many of the other classes have low relative frequencies. Specify an ensemble learning template that uses the GentleBoost method and 50 weak classification tree learners. t = templateEnsemble('GentleBoost',50,'Tree');

t is a template object. Most of the options are empty ([]). The software uses default values for all empty options during training. Because the response variable contains many classes, specify a sparse random coding design. rng(1); % For reproducibility Coding = designecoc(K,'sparserandom');

Train and cross-validate an ECOC model using parallel computing. Fit posterior probabilities (returned by kfoldPredict). 32-2819

32

Functions

pool = parpool;

% Invokes workers

Starting parallel pool (parpool) using the 'local' profile ... connected to 6 workers. options = statset('UseParallel',1); CVMdl = fitcecoc(X,Y,'Learner',t,'Options',options,'Coding',Coding,... 'FitPosterior',1,'CrossVal','on'); Warning: One or more folds do not contain points from all the groups.

CVMdl is a ClassificationPartitionedECOC model. By default, the software implements 10-fold cross-validation. You can specify a different number of folds using the 'KFold' name-value pair argument. The pool invokes six workers, although the number of workers might vary among systems. Because some classes have low relative frequency, one or more folds most likely do not contain observations from all classes. Estimate posterior probabilities, and display the posterior probability of being classified as not having arrhythmia (class 1) given the data for a random set of validation-fold observations. [~,~,~,posterior] = kfoldPredict(CVMdl,'Options',options); idx = randsample(n,10); table(idx,Y(idx),posterior(idx,1),... 'VariableNames',{'OOFSampleIndex','TrueLabel','PosteriorNoArrhythmia'}) ans=10×3 table OOFSampleIndex ______________ 171 221 72 3 202 243 18 49 234 315

TrueLabel _________

PosteriorNoArrhythmia _____________________

1 1 16 10 1 1 1 6 1 1

0.33654 0.85135 0.9174 0.025649 0.8438 0.9435 0.81198 0.090154 0.61625 0.97187

Input Arguments CVMdl — Cross-validated ECOC model ClassificationPartitionedECOC model Cross-validated ECOC model, specified as a ClassificationPartitionedECOC model. You can create a ClassificationPartitionedECOC model in two ways: • Pass a trained ECOC model (ClassificationECOC) to crossval. • Train an ECOC model using fitcecoc and specify any one of these cross-validation name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. 32-2820

kfoldPredict

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldPredict(CVMdl,'PosteriorMethod','qp') specifies to estimate multiclass posterior probabilities by solving a least-squares problem using quadratic programming. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in loss function name or function handle. • This table describes the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses so that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, for example customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction has this form: bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. For an example of passing a custom binary loss function, see “Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function” on page 32-4169. 32-2821

32

Functions

The default BinaryLoss value depends on the score ranges returned by the binary learners. This table describes some default BinaryLoss values based on the given assumptions. Assumption

Default Value

All binary learners are SVMs or either linear or kernel classification models of SVM learners.

'hinge'

All binary learners are ensembles trained by AdaboostM1 or GentleBoost.

'exponential'

All binary learners are ensembles trained by LogitBoost.

'binodeviance'

All binary learners are linear or kernel classification models of logistic regression learners. Or, you specify to predict class posterior probabilities by setting 'FitPosterior',true in fitcecoc.

'quadratic'

To check the default value, use dot notation to display the BinaryLoss property of the trained model at the command line. Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2824. Example: 'Decoding','lossbased' NumKLInitializations — Number of random initial values 0 (default) | nonnegative integer scalar Number of random initial values for fitting posterior probabilities by Kullback-Leibler divergence minimization, specified as the comma-separated pair consisting of 'NumKLInitializations' and a nonnegative integer scalar. If you do not request the fourth output argument (Posterior) and set 'PosteriorMethod','kl' (the default), then the software ignores the value of NumKLInitializations. For more details, see “Posterior Estimation Using Kullback-Leibler Divergence” on page 32-2825. Example: 'NumKLInitializations',5 Data Types: single | double Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. 32-2822

kfoldPredict

• Specify 'Options',statset('UseParallel',true). PosteriorMethod — Posterior probability estimation method 'kl' (default) | 'qp' Posterior probability estimation method, specified as the comma-separated pair consisting of 'PosteriorMethod' and 'kl' or 'qp'. • If PosteriorMethod is 'kl', then the software estimates multiclass posterior probabilities by minimizing the Kullback-Leibler divergence between the predicted and expected posterior probabilities returned by binary learners. For details, see “Posterior Estimation Using KullbackLeibler Divergence” on page 32-2825. • If PosteriorMethod is 'qp', then the software estimates multiclass posterior probabilities by solving a least-squares problem using quadratic programming. You need an Optimization Toolbox license to use this option. For details, see “Posterior Estimation Using Quadratic Programming” on page 32-2826. • If you do not request the fourth output argument (Posterior), then the software ignores the value of PosteriorMethod. Example: 'PosteriorMethod','qp' Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments label — Predicted class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Predicted class labels, returned as a categorical or character array, logical or numeric vector, or cell array of character vectors. label has the same data type and number of rows as CVMdl.Y. The software predicts the classification of an observation by assigning the observation to the class yielding the largest negated average binary loss (or, equivalently, the smallest average binary loss). NegLoss — Negated average binary losses numeric matrix Negated average binary losses, returned as a numeric matrix. NegLoss is an n-by-K matrix, where n is the number of observations (size(CVMdl.X,1)) and K is the number of unique classes (size(CVMdl.ClassNames,1)). 32-2823

32

Functions

PBScore — Positive-class scores numeric matrix Positive-class scores for each binary learner, returned as a numeric matrix. PBScore is an n-by-L matrix, where n is the number of observations (size(CVMdl.X,1)) and L is the number of binary learners (size(CVMdl.CodingMatrix,2)). If the coding matrix varies across folds (that is, the coding scheme is sparserandom or denserandom), then PBScore is empty ([]). Posterior — Posterior class probabilities numeric matrix Posterior class probabilities, returned as a numeric matrix. Posterior is an n-by-K matrix, where n is the number of observations (size(CVMdl.X,1)) and K is the number of unique classes (size(CVMdl.ClassNames,1)). You must set 'FitPosterior',1 when training the cross-validated ECOC model using fitcecoc in order to request Posterior. Otherwise, the software throws an error.

More About Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin k

j=1

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. 32-2824

kfoldPredict

This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

Algorithms The software can estimate class posterior probabilities by minimizing the Kullback-Leibler divergence or by using quadratic programming. For the following descriptions of the posterior estimation algorithms, assume that: • mkj is the element (k,j) of the coding design matrix M. • I is the indicator function. • p k is the class posterior probability estimate for class k of an observation, k = 1,...,K. • rj is the positive-class posterior probability for binary learner j. That is, rj is the probability that binary learner j classifies an observation into the positive class, given the training data. Posterior Estimation Using Kullback-Leibler Divergence By default, the software minimizes the Kullback-Leibler divergence to estimate class posterior probabilities. The Kullback-Leibler divergence between the expected and observed positive-class posterior probabilities is Δ(r, r ) =

L



j=1

where w j =

w j r jlog

rj 1 − rj + 1 − r j log , r j 1−r j

∑ wi∗ is the weight for binary learner j. Sj

• Sj is the set of observation indices on which binary learner j is trained. • wi∗ is the weight of observation i. 32-2825

32

Functions

The software minimizes the divergence iteratively. The first step is to choose initial values (0)

p k ; k = 1, ..., K for the class posterior probabilities. • If you do not specify 'NumKLIterations', then the software tries both sets of deterministic initial values described next, and selects the set that minimizes Δ. • p (0) = 1/K; k = 1, ..., K . k • p (0); k = 1, ..., K is the solution of the system k M01p

(0)

= r,

where M01 is M with all mkj = –1 replaced with 0, and r is a vector of positive-class posterior probabilities returned by the L binary learners [Dietterich et al.] on page 18-189. The software uses lsqnonneg to solve the system. • If you specify 'NumKLIterations',c, where c is a natural number, then the software does the (0)

following to choose the set p k ; k = 1, ..., K, and selects the set that minimizes Δ. • The software tries both sets of deterministic initial values as described previously. • The software randomly generates c vectors of length K using rand, and then normalizes each vector to sum to 1. At iteration t, the software completes these steps: 1

Compute K

r

(t) j

=

k=1 K



k=1

2

(t)



p k I(mk j = + 1)

(t) p k I(mk j

. = + 1 ∪ mk j = − 1)

Estimate the next class posterior probability using L

(t + 1) pk

=

(t) pk



j=1 L



j=1

w j r jI mk j = + 1 + 1 − r j I mk j = − 1

wj r

(t) j I

3

Normalize p k

4

Check for convergence.

mk j = + 1 + 1 − r

(t) j

. I mk j = − 1

(t + 1)

; k = 1, ..., K so that they sum to 1.

For more details, see [Hastie et al.] on page 18-190 and [Zadrozny] on page 18-191. Posterior Estimation Using Quadratic Programming Posterior probability estimation using quadratic programming requires an Optimization Toolbox license. To estimate posterior probabilities for an observation using this method, the software completes these steps: 1

32-2826

Estimate the positive-class posterior probabilities, rj, for binary learners j = 1,...,L.

kfoldPredict

2

Using the relationship between rj and p k [Wu et al.] on page 18-191, minimize L



j=1

−r j

K



k=1

p kI mk j = − 1 + 1 − r j

K



k=1

2

p kI mk j = + 1

with respect to p k and the restrictions 0 ≤ pk ≤ 1

∑ p k = 1. k

The software performs minimization using quadprog.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Dietterich, T., and G. Bakiri. “Solving Multiclass Learning Problems Via Error-Correcting Output Codes.” Journal of Artificial Intelligence Research. Vol. 2, 1995, pp. 263–286. [3] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [4] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297. [5] Hastie, T., and R. Tibshirani. “Classification by Pairwise Coupling.” Annals of Statistics. Vol. 26, Issue 2, 1998, pp. 451–471. [6] Wu, T. F., C. J. Lin, and R. Weng. “Probability Estimates for Multi-Class Classification by Pairwise Coupling.” Journal of Machine Learning Research. Vol. 5, 2004, pp. 975–1005. [7] Zadrozny, B. “Reducing Multiclass to Binary by Coupling Probability Estimates.” NIPS 2001: Proceedings of Advances in Neural Information Processing Systems 14, 2001, pp. 1041–1048.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. 32-2827

32

Functions

For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationECOC | ClassificationPartitionedECOC | edge | fitcecoc | predict | quadprog | statset Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2014b

32-2828

kfoldPredict

kfoldPredict Package: classreg.learning.partition Classify observations in cross-validated kernel classification model

Syntax label = kfoldPredict(CVMdl) [label,score] = kfoldPredict(CVMdl)

Description label = kfoldPredict(CVMdl) returns class labels predicted by the cross-validated, binary kernel model (ClassificationPartitionedKernel) CVMdl. For every fold, kfoldPredict predicts class labels for validation-fold observations using a model trained on training-fold observations. [label,score] = kfoldPredict(CVMdl) also returns classification scores on page 32-2833 for both classes.

Examples Classify Observations Using Cross-Validation Classify observations using a cross-validated, binary kernel classifier, and display the confusion matrix for the resulting classification. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g'). load ionosphere

Cross-validate a binary kernel classification model using the data. rng(1); % For reproducibility CVMdl = fitckernel(X,Y,'Crossval','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernel CrossValidatedModel: 'Kernel' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

32-2829

32

Functions

CVMdl is a ClassificationPartitionedKernel model. By default, the software implements 10fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Classify the observations that fitckernel does not use in training the folds. label = kfoldPredict(CVMdl);

Construct a confusion matrix to compare the true classes of the observations to their predicted labels. C = confusionchart(Y,label);

The CVMdl model misclassifies 32 good ('g') radar returns as being bad ('b') and misclassifies 7 bad radar returns as being good.

Estimate k-Fold Cross-Validation Posterior Class Probabilities Estimate posterior class probabilities using a cross-validated, binary kernel classifier, and determine the quality of the model by plotting a receiver operating characteristic (ROC) curve. Cross-validated kernel classification models return posterior probabilities for logistic regression learners only. Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g'). 32-2830

kfoldPredict

load ionosphere

Cross-validate a binary kernel classification model using the data. Specify the class order, and fit logistic regression learners. rng(1); % For reproducibility CVMdl = fitckernel(X,Y,'Crossval','on', ... 'ClassNames',{'b','g'},'Learner','logistic') CVMdl = classreg.learning.partition.ClassificationPartitionedKernel CrossValidatedModel: 'Kernel' ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedKernel model. By default, the software implements 10fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Predict the posterior class probabilities for the observations that fitckernel does not use in training the folds. [~,posterior] = kfoldPredict(CVMdl);

The output posterior is a matrix with two columns and n rows, where n is the number of observations. Column i contains posterior probabilities of CVMdl.ClassNames(i) given a particular observation. Obtain false and true positive rates, and estimate the area under the curve (AUC). Specify that the second class is the positive class. [fpr,tpr,~,auc] = perfcurve(Y,posterior(:,2),CVMdl.ClassNames(2)); auc auc = 0.9441

The AUC is close to 1, which indicates that the model predicts labels well. Plot an ROC curve. plot(fpr,tpr) xlabel('False positive rate') ylabel('True positive rate') title('ROC Curve')

32-2831

32

Functions

Input Arguments CVMdl — Cross-validated, binary kernel classification model ClassificationPartitionedKernel model object Cross-validated, binary kernel classification model, specified as a ClassificationPartitionedKernel model object. You can create a ClassificationPartitionedKernel model by using fitckernel and specifying any one of the cross-validation name-value pair arguments. To obtain estimates, kfoldPredict applies the same data used to cross-validate the kernel classification model (X and Y).

Output Arguments label — Predicted class labels categorical array | character array | logical matrix | numeric matrix | cell array of character vectors Predicted class labels, returned as a categorical or character array, logical or numeric matrix, or cell array of character vectors. label has n rows, where n is the number of observations in X, and has the same data type as the observed class labels (Y) used to train CVMdl. (The software treats string arrays as cell arrays of character vectors.) 32-2832

kfoldPredict

kfoldPredict classifies observations into the class yielding the highest score. score — Classification scores numeric array Classification scores on page 32-2833, returned as an n-by-2 numeric array, where n is the number of observations in X. score(i,j) is the score for classifying observation i into class j. The order of the classes is stored in CVMdl.ClassNames. If CVMdl.Trained{1}.Learner is 'logistic', then classification scores are posterior probabilities.

More About Classification Score For kernel classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by f x = T(x)β + b . • T · is a transformation of an observation for feature expansion. • β is the estimated column vector of coefficients. • b is the estimated scalar bias. The raw classification score for classifying x into the negative class is −f(x). The software classifies observations into the class that yields a positive score. If the kernel classification model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores (see ScoreTransform).

See Also ClassificationPartitionedKernel | fitckernel Introduced in R2018b

32-2833

32

Functions

kfoldPredict Package: classreg.learning.partition Classify observations in cross-validated kernel ECOC model

Syntax label = kfoldPredict(CVMdl) label = kfoldPredict(CVMdl,Name,Value) [label,NegLoss,PBScore] = kfoldPredict( ___ ) [label,NegLoss,PBScore,Posterior] = kfoldPredict( ___ )

Description label = kfoldPredict(CVMdl) returns class labels predicted by the cross-validated kernel ECOC model (ClassificationPartitionedKernelECOC) CVMdl. For every fold, kfoldPredict predicts class labels for validation-fold observations using a model trained on training-fold observations. kfoldPredict applies the same data used to create CVMdl (see fitcecoc). The software predicts the classification of an observation by assigning the observation to the class yielding the largest negated average binary loss (or, equivalently, the smallest average binary loss). label = kfoldPredict(CVMdl,Name,Value) returns predicted class labels with additional options specified by one or more name-value pair arguments. For example, specify the posterior probability estimation method, decoding scheme, or verbosity level. [label,NegLoss,PBScore] = kfoldPredict( ___ ) additionally returns negated values of the average binary loss per class (NegLoss) for validation-fold observations and positive-class scores (PBScore) for validation-fold observations classified by each binary learner, using any of the input argument combinations in the previous syntaxes. If the coding matrix varies across folds (that is, the coding scheme is sparserandom or denserandom), then PBScore is empty ([]). [label,NegLoss,PBScore,Posterior] = kfoldPredict( ___ ) additionally returns posterior class probability estimates for validation-fold observations (Posterior). To obtain posterior class probabilities, the kernel classification binary learners must be logistic regression models. Otherwise, kfoldPredict throws an error.

Examples Classify Observations Using Cross-Validation Classify observations using a cross-validated, multiclass kernel ECOC classifier, and display the confusion matrix for the resulting classification. Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. 32-2834

kfoldPredict

load fisheriris X = meas; Y = species;

Cross-validate an ECOC model composed of kernel binary learners. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners','kernel','CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedKernelECOC model. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'. Classify the observations that fitcecoc does not use in training the folds. label = kfoldPredict(CVMdl);

Construct a confusion matrix to compare the true classes of the observations to their predicted labels. C = confusionchart(Y,label);

32-2835

32

Functions

The CVMdl model misclassifies four 'versicolor' irises as 'virginica' irises and misclassifies one 'virginica' iris as a 'versicolor' iris.

Predict Cross-Validation Labels Using Custom Binary Loss Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Cross-validate an ECOC model of kernel classification models using 5-fold cross-validation. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners','kernel','KFold',5) CVMdl = classreg.learning.partition.ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 5 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'}

32-2836

kfoldPredict

ScoreTransform: 'none' Properties, Methods

CVMdl is a ClassificationPartitionedKernelECOC model. It contains the property Trained, which is a 5-by-1 cell array of CompactClassificationECOC models. By default, the kernel classification models that compose the CompactClassificationECOC models use SVMs. SVM scores are signed distances from the observation to the decision boundary. Therefore, the domain is ( − ∞ , ∞ ). Create a custom binary loss function that: • Maps the coding design matrix (M) and positive-class classification scores (s) for each learner to the binary loss for each observation • Uses linear loss • Aggregates the binary learner loss using the median You can create a separate function for the binary loss function, and then save it on the MATLAB® path. Or, you can specify an anonymous binary loss function. In this case, create a function handle (customBL) to an anonymous binary loss function. customBL = @(M,s)nanmedian(1 - bsxfun(@times,M,s),2)/2;

Predict cross-validation labels and estimate the median binary loss per class. Print the median negative binary losses per class for a random set of 10 observations. [label,NegLoss] = kfoldPredict(CVMdl,'BinaryLoss',customBL); idx = randsample(numel(label),10); table(Y(idx),label(idx),NegLoss(idx,1),NegLoss(idx,2),NegLoss(idx,3),... 'VariableNames',[{'True'};{'Predicted'};... unique(CVMdl.ClassNames)]) ans=10×5 table True ______________

Predicted ______________

setosa ________

versicolor __________

{'setosa' } {'setosa' } {'virginica' } {'virginica' } {'virginica' } {'virginica' } {'setosa' } {'setosa' } {'virginica' } {'versicolor'}

{'setosa' } {'setosa' } {'versicolor'} {'virginica' } {'virginica' } {'virginica' } {'setosa' } {'setosa' } {'virginica' } {'versicolor'}

0.20926 0.16144 -0.83532 -0.97235 -0.89441 -0.86774 -0.1026 0.1001 -1.0106 -1.0298

-0.84572 -0.90572 -0.12157 -0.69759 -0.69937 -0.47297 -0.69671 -0.89163 -0.52919 0.027354

virginica _________ -0.86354 -0.75572 -0.54311 0.16994 0.093778 -0.15929 -0.70069 -0.70848 0.039829 -0.49757

The cross-validated model correctly predicts the labels for 9 of the 10 random observations.

32-2837

32

Functions

Estimate k-Fold Cross-Validation Posterior Class Probabilities Estimate posterior class probabilities using a cross-validated, multiclass kernel ECOC classification model. Kernel classification models return posterior probabilities for logistic regression learners only. Load Fisher's iris data set. X contains flower measurements, and Y contains the names of flower species. load fisheriris X = meas; Y = species;

Create a kernel template for the binary kernel classification models. Specify to fit logistic regression learners. t = templateKernel('Learner','logistic') t = Fit template for classification Kernel. BetaTolerance: BlockSize: BoxConstraint: Epsilon: NumExpansionDimensions: GradientTolerance: HessianHistorySize: IterationLimit: KernelScale: Lambda: Learner: LossFunction: Stream: VerbosityLevel: Version: Method: Type:

[] [] [] [] [] [] [] [] [] [] 'logistic' [] [] [] 1 'Kernel' 'classification'

t is a kernel template. Most of its properties are empty. When training an ECOC classifier using the template, the software sets the applicable properties to their default values. Cross-validate an ECOC model using the kernel template. rng('default'); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'CrossVal','on') CVMdl = classreg.learning.partition.ClassificationPartitionedKernelECOC CrossValidatedModel: 'KernelECOC' ResponseName: 'Y' NumObservations: 150 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none'

32-2838

kfoldPredict

Properties, Methods

CVMdl is a ClassificationPartitionedECOC model. By default, the software uses 10-fold crossvalidation. Predict the validation-fold class posterior probabilities. [label,~,~,Posterior] = kfoldPredict(CVMdl);

The software assigns an observation to the class that yields the smallest average binary loss. Because all binary learners are computing posterior probabilities, the binary loss function is quadratic. Display the posterior probabilities for 10 randomly selected observations. idx = randsample(size(X,1),10); CVMdl.ClassNames ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }

table(Y(idx),label(idx),Posterior(idx,:),... 'VariableNames',{'TrueLabel','PredLabel','Posterior'}) ans=10×3 table TrueLabel ______________

PredLabel ______________

Posterior ________________________________

{'setosa' } {'virginica' } {'virginica' } {'setosa' } {'versicolor'} {'versicolor'} {'versicolor'} {'virginica' } {'setosa' } {'virginica' }

{'setosa' } {'virginica' } {'virginica' } {'setosa' } {'versicolor'} {'versicolor'} {'versicolor'} {'virginica' } {'setosa' } {'virginica' }

0.68216 0.1581 0.071807 0.74918 0.09375 0.036202 0.2252 0.061562 0.42448 0.082705

0.18546 0.14405 0.093291 0.11434 0.67149 0.85544 0.50473 0.11086 0.21181 0.1428

0.13238 0.69785 0.8349 0.13648 0.23476 0.10836 0.27007 0.82758 0.36371 0.7745

The columns of Posterior correspond to the class order of CVMdl.ClassNames.

Input Arguments CVMdl — Cross-validated kernel ECOC model ClassificationPartitionedKernelECOC model Cross-validated kernel ECOC model, specified as a ClassificationPartitionedKernelECOC model. You can create a ClassificationPartitionedKernelECOC model by training an ECOC model using fitcecoc and specifying these name-value pair arguments: • 'Learners'– Set the value to 'kernel', a template object returned by templateKernel, or a cell array of such template objects. 32-2839

32

Functions

• One of the arguments 'CrossVal', 'CVPartition', 'Holdout', 'KFold', or 'Leaveout'. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: kfoldPredict(CVMdl,'PosteriorMethod','qp') specifies to estimate multiclass posterior probabilities by solving a least-squares problem using quadratic programming. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in loss function name or function handle. • This table contains names and descriptions of the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, for example, customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction has this form: bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. 32-2840

kfoldPredict

• L is the number of binary learners. By default, if all binary learners are kernel classification models using SVM, then BinaryLoss is 'hinge'. If all binary learners are kernel classification models using logistic regression, then BinaryLoss is 'quadratic'. Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2843. Example: 'Decoding','lossbased' NumKLInitializations — Number of random initial values 0 (default) | nonnegative integer scalar Number of random initial values for fitting posterior probabilities by Kullback-Leibler divergence minimization, specified as the comma-separated pair consisting of 'NumKLInitializations' and a nonnegative integer scalar. If you do not request the fourth output argument (Posterior) and set 'PosteriorMethod','kl' (the default), then the software ignores the value of NumKLInitializations. For more details, see “Posterior Estimation Using Kullback-Leibler Divergence” on page 32-2844. Example: 'NumKLInitializations',5 Data Types: single | double Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). PosteriorMethod — Posterior probability estimation method 'kl' (default) | 'qp' Posterior probability estimation method, specified as the comma-separated pair consisting of 'PosteriorMethod' and 'kl' or 'qp'. • If PosteriorMethod is 'kl', then the software estimates multiclass posterior probabilities by minimizing the Kullback-Leibler divergence between the predicted and expected posterior probabilities returned by binary learners. For details, see “Posterior Estimation Using KullbackLeibler Divergence” on page 32-2844. 32-2841

32

Functions

• If PosteriorMethod is 'qp', then the software estimates multiclass posterior probabilities by solving a least-squares problem using quadratic programming. You need an Optimization Toolbox license to use this option. For details, see “Posterior Estimation Using Quadratic Programming” on page 32-2845. • If you do not request the fourth output argument (Posterior), then the software ignores the value of PosteriorMethod. Example: 'PosteriorMethod','qp' Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments label — Predicted class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Predicted class labels, returned as a categorical or character array, logical or numeric vector, or cell array of character vectors. label has the same data type and number of rows as CVMdl.Y. The software predicts the classification of an observation by assigning the observation to the class yielding the largest negated average binary loss (or, equivalently, the smallest average binary loss). NegLoss — Negated average binary losses numeric matrix Negated average binary losses, returned as a numeric matrix. NegLoss is an n-by-K matrix, where n is the number of observations (size(CVMdl.Y,1)) and K is the number of unique classes (size(CVMdl.ClassNames,1)). PBScore — Positive-class scores numeric matrix Positive-class scores for each binary learner, returned as a numeric matrix. PBScore is an n-by-L matrix, where n is the number of observations (size(CVMdl.Y,1)) and L is the number of binary learners (size(CVMdl.CodingMatrix,2)). If the coding matrix varies across folds (that is, the coding scheme is sparserandom or denserandom), then PBScore is empty ([]). Posterior — Posterior class probabilities numeric matrix 32-2842

kfoldPredict

Posterior class probabilities, returned as a numeric matrix. Posterior is an n-by-K matrix, where n is the number of observations (size(CVMdl.Y,1)) and K is the number of unique classes (size(CVMdl.ClassNames,1)). To return posterior probabilities, each kernel classification binary learner must have its Learner property set to 'logistic'. Otherwise, the software throws an error.

More About Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

32-2843

32

Functions

Value

Description

Score Domain

g(yj,sj)

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

Algorithms The software can estimate class posterior probabilities by minimizing the Kullback-Leibler divergence or by using quadratic programming. For the following descriptions of the posterior estimation algorithms, assume that: • mkj is the element (k,j) of the coding design matrix M. • I is the indicator function. • p k is the class posterior probability estimate for class k of an observation, k = 1,...,K. • rj is the positive-class posterior probability for binary learner j. That is, rj is the probability that binary learner j classifies an observation into the positive class, given the training data. Posterior Estimation Using Kullback-Leibler Divergence By default, the software minimizes the Kullback-Leibler divergence to estimate class posterior probabilities. The Kullback-Leibler divergence between the expected and observed positive-class posterior probabilities is Δ(r, r ) =

L



j=1

where w j =

w j r jlog

rj 1 − rj + 1 − r j log , r j 1−r j

∑ wi∗ is the weight for binary learner j. Sj

• Sj is the set of observation indices on which binary learner j is trained. • wi∗ is the weight of observation i. The software minimizes the divergence iteratively. The first step is to choose initial values (0)

p k ; k = 1, ..., K for the class posterior probabilities. • If you do not specify 'NumKLIterations', then the software tries both sets of deterministic initial values described next, and selects the set that minimizes Δ. • p (0) = 1/K; k = 1, ..., K . k • p (0); k = 1, ..., K is the solution of the system k 32-2844

kfoldPredict

M01p

(0)

= r,

where M01 is M with all mkj = –1 replaced with 0, and r is a vector of positive-class posterior probabilities returned by the L binary learners [Dietterich et al.] on page 18-189. The software uses lsqnonneg to solve the system. • If you specify 'NumKLIterations',c, where c is a natural number, then the software does the (0)

following to choose the set p k ; k = 1, ..., K, and selects the set that minimizes Δ. • The software tries both sets of deterministic initial values as described previously. • The software randomly generates c vectors of length K using rand, and then normalizes each vector to sum to 1. At iteration t, the software completes these steps: 1

Compute K

r

(t) j

=

k=1 K



k=1

2

(t)



p k I(mk j = + 1)

(t) p k I(mk j

. = + 1 ∪ mk j = − 1)

Estimate the next class posterior probability using L

(t + 1) pk

=

(t) pk



j=1 L



j=1

w j r jI mk j = + 1 + 1 − r j I mk j = − 1

wj r

(t) j I

3

Normalize p k

4

Check for convergence.

mk j = + 1 + 1 − r

(t) j

. I mk j = − 1

(t + 1)

; k = 1, ..., K so that they sum to 1.

For more details, see [Hastie et al.] on page 18-190 and [Zadrozny] on page 18-191. Posterior Estimation Using Quadratic Programming Posterior probability estimation using quadratic programming requires an Optimization Toolbox license. To estimate posterior probabilities for an observation using this method, the software completes these steps: 1

Estimate the positive-class posterior probabilities, rj, for binary learners j = 1,...,L.

2

Using the relationship between rj and p k [Wu et al.] on page 18-191, minimize L



j=1

−r j

K



k=1

p kI mk j = − 1 + 1 − r j

K



k=1

2

p kI mk j = + 1

with respect to p k and the restrictions 0 ≤ pk ≤ 1

∑ p k = 1. k

32-2845

32

Functions

The software performs minimization using quadprog.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Dietterich, T., and G. Bakiri. “Solving Multiclass Learning Problems Via Error-Correcting Output Codes.” Journal of Artificial Intelligence Research. Vol. 2, 1995, pp. 263–286. [3] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [4] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297. [5] Hastie, T., and R. Tibshirani. “Classification by Pairwise Coupling.” Annals of Statistics. Vol. 26, Issue 2, 1998, pp. 451–471. [6] Wu, T. F., C. J. Lin, and R. Weng. “Probability Estimates for Multi-Class Classification by Pairwise Coupling.” Journal of Machine Learning Research. Vol. 5, 2004, pp. 975–1005. [7] Zadrozny, B. “Reducing Multiclass to Binary by Coupling Probability Estimates.” NIPS 2001: Proceedings of Advances in Neural Information Processing Systems 14, 2001, pp. 1041–1048.

See Also ClassificationPartitionedKernelECOC | fitcecoc Introduced in R2018b

32-2846

kfoldPredict

kfoldPredict Predict labels for observations not used for training

Syntax Label = kfoldPredict(CVMdl) [Label,Score] = kfoldPredict(CVMdl)

Description Label = kfoldPredict(CVMdl) returns cross-validated class labels predicted by the crossvalidated, binary, linear classification model CVMdl. That is, for every fold, kfoldPredict predicts class labels for observations that it holds out when it trains using all other observations. Label contains predicted class labels for each regularization strength in the linear classification models that compose CVMdl. [Label,Score] = kfoldPredict(CVMdl) also returns cross-validated classification scores on page 32-2854 for both classes. Score contains classification scores for each regularization strength in CVMdl.

Input Arguments CVMdl — Cross-validated, binary, linear classification model ClassificationPartitionedLinear model object Cross-validated, binary, linear classification model, specified as a ClassificationPartitionedLinear model object. You can create a ClassificationPartitionedLinear model using fitclinear and specifying any one of the cross-validation, name-value pair arguments, for example, CrossVal. To obtain estimates, kfoldPredict applies the same data used to cross-validate the linear classification model (X and Y).

Output Arguments Label — Cross-validated, predicted class labels categorical array | character array | logical matrix | numeric matrix | cell array of character vectors Cross-validated, predicted class labels, returned as a categorical or character array, logical or numeric matrix, or cell array of character vectors. In most cases, Label is an n-by-L array of the same data type as the observed class labels (see Y) used to create CVMdl. (The software treats string arrays as cell arrays of character vectors.) n is the number of observations in the predictor data (see X) and L is the number of regularization strengths in CVMdl.Trained{1}.Lambda. That is, Label(i,j) is the predicted class label for observation i using the linear classification model that has regularization strength CVMdl.Trained{1}.Lambda(j). 32-2847

32

Functions

If Y is a character array and L > 1, then Label is a cell array of class labels. Score — Cross-validated classification scores numeric array Cross-validated classification scores on page 32-2854, returned as an n-by-2-by-L numeric array. n is the number of observations in the predictor data that created CVMdl (see X) and L is the number of regularization strengths in CVMdl.Trained{1}.Lambda. Score(i,k,j) is the score for classifying observation i into class k using the linear classification model that has regularization strength CVMdl.Trained{1}.Lambda(j). CVMdl.ClassNames stores the order of the classes. If CVMdl.Trained{1}.Learner is 'logistic', then classification scores are posterior probabilities.

Examples Predict k-fold Cross-Validation Labels Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. Ystats = Y == 'stats';

Cross-validate a binary, linear classification model using the entire data set, which can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'CrossVal','on'); Mdl1 = CVMdl.Trained{1} Mdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'none' Beta: [34023x1 double] Bias: -1.0008 Lambda: 3.5193e-05 Learner: 'svm' Properties, Methods

CVMdl is a ClassificationPartitionedLinear model. By default, the software implements 10fold cross validation. You can alter the number of folds using the 'KFold' name-value pair argument. 32-2848

kfoldPredict

Predict labels for the observations that fitclinear did not use in training the folds. label = kfoldPredict(CVMdl);

Because there is one regularization strength in Mdl1, label is a column vector of predictions containing as many rows as observations in X. Construct a confusion matrix. ConfusionTrain = confusionchart(Ystats,label);

The model misclassifies 15 'stats' documentation pages as being outside of the Statistics and Machine Learning Toolbox documentation, and misclassifies nine pages as 'stats' pages.

Estimate k-fold Cross-Validation Posterior Class Probabilities Linear classification models return posterior probabilities for logistic regression learners only. Load the NLP data set and preprocess it as in “Predict k-fold Cross-Validation Labels” on page 322848. Transpose the predictor data matrix. load nlpdata Ystats = Y == 'stats'; X = X';

32-2849

32

Functions

Cross-validate binary, linear classification models using 5-fold cross-validation. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8. rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'KFold',5,'Learner','logistic','Solver','sparsa',... 'Regularization','lasso','GradientTolerance',1e-8);

Predict the posterior class probabilities for observations not used to train each fold. [~,posterior] = kfoldPredict(CVMdl); CVMdl.ClassNames ans = 2x1 logical array 0 1

Because there is one regularization strength in CVMdl, posterior is a matrix with 2 columns and rows equal to the number of observations. Column i contains posterior probabilities of Mdl.ClassNames(i) given a particular observation. Obtain false and true positive rates, and estimate the AUC. Specify that the second class is the positive class. [fpr,tpr,~,auc] = perfcurve(Ystats,posterior(:,2),CVMdl.ClassNames(2)); auc auc = 0.9990

The AUC is 0.9990, which indicates a model that predicts well. Plot an ROC curve. figure; plot(fpr,tpr) h = gca; h.XLim(1) = -0.1; h.YLim(2) = 1.1; xlabel('False positive rate') ylabel('True positive rate') title('ROC Curve')

32-2850

kfoldPredict

The ROC curve indicates that the model classifies almost perfectly.

Find Good Lasso Penalty Using Cross-Validated AUC To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare cross-validated AUC values. Load the NLP data set. Preprocess the data as in “Estimate k-fold Cross-Validation Posterior Class Probabilities” on page 32-2849. load nlpdata Ystats = Y == 'stats'; X = X';

There are 9471 observations in the test sample. −6

Create a set of 11 logarithmically-spaced regularization strengths from 10

−0 . 5

through 10

.

Lambda = logspace(-6,-0.5,11);

Cross-validate a binary, linear classification models that use each of the regularization strengths and 5-fold cross-validation. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8. 32-2851

32

Functions

rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'KFold',5,'Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8) CVMdl = classreg.learning.partition.ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1×1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods Mdl1 = CVMdl.Trained{1}

Mdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'logit' Beta: [34023×11 double] Bias: [-13.2904 -13.2904 -13.2904 -13.2904 -9.9357 -7.0782 -5.4335 -4.5473 -3.4223 Lambda: [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 1.5849e-04 5.6234e-04 0.0020 0.0 Learner: 'logistic' Properties, Methods

Mdl1 is a ClassificationLinear model object. Because Lambda is a sequence of regularization strengths, you can think of Mdl1 as 11 models, one for each regularization strength in Lambda. Predict the cross-validated labels and posterior class probabilities. [label,posterior] = kfoldPredict(CVMdl); CVMdl.ClassNames; [n,K,L] = size(posterior) n = 31572 K = 2 L = 11 posterior(3,1,5) ans = 1.0000

label is a 31572-by-11 matrix of predicted labels. Each column corresponds to the predicted labels of the model trained using the corresponding regularization strength. posterior is a 31572-by-2by-11 matrix of posterior class probabilities. Columns correspond to classes and pages correspond to regularization strengths. For example, posterior(3,1,5) indicates that the posterior probability 32-2852

kfoldPredict

that the first class (label 0) is assigned to observation 3 by the model that uses Lambda(5) as a regularization strength is 1.0000. For each model, compute the AUC. Designate the second class as the positive class. auc = 1:numel(Lambda); % Preallocation for j = 1:numel(Lambda) [~,~,~,auc(j)] = perfcurve(Ystats,posterior(:,2,j),CVMdl.ClassNames(2)); end

Higher values of Lambda lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you trained the model. Determine the number of nonzero coefficients per model. Mdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the test-sample error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale. figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(auc),... log10(Lambda),log10(numNZCoeff + 1)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} AUC') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') title('Cross-Validated Statistics') hold off

32-2853

32

Functions

Choose the index of the regularization strength that balances predictor variable sparsity and high −3

AUC. In this case, a value between 10

−1

to 10

should suffice.

idxFinal = 9;

Select the model from Mdl with the chosen regularization strength. MdlFinal = selectModels(Mdl,idxFinal);

MdlFinal is a ClassificationLinear model containing one regularization strength. To estimate labels for new observations, pass MdlFinal and the new data to predict.

More About Classification Score For linear classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by f j(x) = xβ j + b j . For the model with regularization strength j, β j is the estimated column vector of coefficients (the model property Beta(:,j)) and b j is the estimated, scalar bias (the model property Bias(j)). 32-2854

kfoldPredict

The raw classification score for classifying x into the negative class is –f(x). The software classifies observations into the class that yields the positive score. If the linear classification model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores (see ScoreTransform).

See Also ClassificationLinear | ClassificationPartitionedLinear | confusionchart | perfcurve | predict | testcholdout Introduced in R2016a

32-2855

32

Functions

kfoldPredict Predict labels for observations not used for training

Syntax Label = kfoldPredict(CVMdl) Label = kfoldPredict(CVMdl,Name,Value) [Label,NegLoss,PBScore] = kfoldPredict( ___ ) [Label,NegLoss,PBScore,Posterior] = kfoldPredict( ___ )

Description Label = kfoldPredict(CVMdl) returns class labels predicted by the cross-validated ECOC model composed of linear classification models CVMdl. That is, for every fold, kfoldPredict predicts class labels for observations that it holds out when it trains using all other observations. kfoldPredict applies the same data used create CVMdl (see fitcecoc). Also, Label contains class labels for each regularization strength in the linear classification models that compose CVMdl. Label = kfoldPredict(CVMdl,Name,Value) returns predicted class labels with additional options specified by one or more Name,Value pair arguments. For example, specify the posterior probability estimation method, decoding scheme, or verbosity level. [Label,NegLoss,PBScore] = kfoldPredict( ___ ) additionally returns, for held-out observations and each regularization strength: • Negated values of the average binary loss per class (NegLoss). • Positive-class scores (PBScore) for each binary learner. [Label,NegLoss,PBScore,Posterior] = kfoldPredict( ___ ) additionally returns posterior class probability estimates for held-out observations and for each regularization strength. To return posterior probabilities, the linear classification model learners must be logistic regression models.

Input Arguments CVMdl — Cross-validated, ECOC model composed of linear classification models ClassificationPartitionedLinearECOC model object Cross-validated, ECOC model composed of linear classification models, specified as a ClassificationPartitionedLinearECOC model object. You can create a ClassificationPartitionedLinearECOC model using fitcecoc and by: 1

Specifying any one of the cross-validation, name-value pair arguments, for example, CrossVal

2

Setting the name-value pair argument Learners to 'linear' or a linear classification model template returned by templateLinear

To obtain estimates, kfoldPredict applies the same data used to cross-validate the ECOC model (X and Y). 32-2856

kfoldPredict

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in, loss-function name or function handle. • This table contains names and descriptions of the built-in functions, where yj is a class label for a particular binary learner (in the set {-1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes the binary losses such that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, e.g., customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction should have this form bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. For an example of passing a custom binary loss function, see “Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function” on page 32-4169. By default, if all binary learners are linear classification models using: 32-2857

32

Functions

• SVM, then BinaryLoss is 'hinge' • Logistic regression, then BinaryLoss is 'quadratic' Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-2824. Example: 'Decoding','lossbased' NumKLInitializations — Number of random initial values 0 (default) | nonnegative integer Number of random initial values for fitting posterior probabilities by Kullback-Leibler divergence minimization, specified as the comma-separated pair consisting of 'NumKLInitializations' and a nonnegative integer. To use this option, you must: • Return the fourth output argument (Posterior). • The linear classification models that compose the ECOC models must use logistic regression learners (that is, CVMdl.Trained{1}.BinaryLearners{1}.Learner must be 'logistic'). • PosteriorMethod must be 'kl'. For more details, see “Posterior Estimation Using Kullback-Leibler Divergence” on page 32-2866. Example: 'NumKLInitializations',5 Data Types: single | double Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. • Specify 'Options',statset('UseParallel',true). PosteriorMethod — Posterior probability estimation method 'kl' (default) | 'qp' Posterior probability estimation method, specified as the comma-separated pair consisting of 'PosteriorMethod' and 'kl' or 'qp'. • To use this option, you must return the fourth output argument (Posterior) and the linear classification models that compose the ECOC models must use logistic regression learners (that is, CVMdl.Trained{1}.BinaryLearners{1}.Learner must be 'logistic'). 32-2858

kfoldPredict

• If PosteriorMethod is 'kl', then the software estimates multiclass posterior probabilities by minimizing the Kullback-Leibler divergence between the predicted and expected posterior probabilities returned by binary learners. For details, see “Posterior Estimation Using KullbackLeibler Divergence” on page 32-2866. • If PosteriorMethod is 'qp', then the software estimates multiclass posterior probabilities by solving a least-squares problem using quadratic programming. You need an Optimization Toolbox license to use this option. For details, see “Posterior Estimation Using Quadratic Programming” on page 32-2867. Example: 'PosteriorMethod','qp' Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments Label — Cross-validated, predicted class labels categorical array | character array | logical matrix | numeric matrix | cell array of character vectors Cross-validated, predicted class labels, returned as a categorical or character array, logical or numeric matrix, or cell array of character vectors. In most cases, Label is an n-by-L array of the same data type as the observed class labels (Y) used to create CVMdl. (The software treats string arrays as cell arrays of character vectors.) n is the number of observations in the predictor data (X) and L is the number of regularization strengths in the linear classification models that compose the cross-validated ECOC model. That is, Label(i,j) is the predicted class label for observation i using the ECOC model of linear classification models that has regularization strength CVMdl.Trained{1}.BinaryLearners{1}.Lambda(j). If Y is a character array and L > 1, then Label is a cell array of class labels. The software assigns the predicted label corresponding to the class with the largest, negated, average binary loss (NegLoss), or, equivalently, the smallest average binary loss. NegLoss — Cross-validated, negated, average binary losses numeric array Cross-validated, negated, average binary losses, returned as an n-by-K-by-L numeric matrix or array. K is the number of distinct classes in the training data and columns correspond to the classes in CVMdl.ClassNames. For n and L, see Label. NegLoss(i,k,j) is the negated, average binary loss for classifying observation i into class k using the linear classification model that has regularization strength CVMdl.Trained{1}.BinaryLoss{1}.Lambda(j). PBScore — Cross-validated, positive-class scores numeric array 32-2859

32

Functions

Cross-validated, positive-class scores, returned as an n-by-B-by-L numeric array. B is the number of binary learners in the cross-validated ECOC model and columns correspond to the binary learners in CVMdl.Trained{1}.BinaryLearners. For n and L, see Label. PBScore(i,b,j) is the positiveclass score of binary learner b for classifying observation i into its positive class, using the linear classification model that has regularization strength CVMdl.Trained{1}.BinaryLearners{1}.Lambda(j). If the coding matrix varies across folds (that is, if the coding scheme is sparserandom or denserandom), then PBScore is empty ([]). Posterior — Cross-validated posterior class probabilities numeric array Cross-validated posterior class probabilities, returned as an n-by-K-by-L numeric array. For dimension definitions, see NegLoss. Posterior(i,k,j) is the posterior probability for classifying observation i into class k using the linear classification model that has regularization strength CVMdl.Trained{1}.BinaryLearners{1}.Lambda(j). To return posterior probabilities, CVMdl.Trained{1}.BinaryLearner{1}.Learner must be 'logistic'.

Examples Predict k-fold Cross-Validation Labels Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. Cross-validate an ECOC model of linear classification models. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learner','linear','CrossVal','on');

CVMdl is a ClassificationPartitionedLinearECOC model. By default, the software implements 10-fold cross validation. Predict labels for the observations that fitcecoc did not use in training the folds. label = kfoldPredict(CVMdl);

Because there is one regularization strength in CVMdl, label is a column vector of predictions containing as many rows as observations in X. Construct a confusion matrix. cm = confusionchart(Y,label);

32-2860

kfoldPredict

Specify Custom Binary Loss Load the NLP data set. Transpose the predictor data. load nlpdata X = X';

For simplicity, use the label 'others' for all observations in Y that are not 'simulink', 'dsp', or 'comm'. Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others';

Create a linear classification model template that specifies optimizing the objective function using SpaRSA. t = templateLinear('Solver','sparsa');

Cross-validate an ECOC model of linear classification models using 5-fold cross-validation. Specify that the predictor observations correspond to columns. rng(1); % For reproducibility CVMdl = fitcecoc(X,Y,'Learners',t,'KFold',5,'ObservationsIn','columns'); CMdl1 = CVMdl.Trained{1} CMdl1 = classreg.learning.classif.CompactClassificationECOC

32-2861

32

Functions

ResponseName: ClassNames: ScoreTransform: BinaryLearners: CodingMatrix:

'Y' [comm dsp 'none' {6x1 cell} [4x6 double]

simulink

others]

Properties, Methods

CVMdl is a ClassificationPartitionedLinearECOC model. It contains the property Trained, which is a 5-by-1 cell array holding a CompactClassificationECOC models that the software trained using the training set of each fold. By default, the linear classification models that compose the ECOC models use SVMs. SVM scores are signed distances from the observation to the decision boundary. Therefore, the domain is ( − ∞ , ∞ ). Create a custom binary loss function that: • Maps the coding design matrix (M) and positive-class classification scores (s) for each learner to the binary loss for each observation • Uses linear loss • Aggregates the binary learner loss using the median. You can create a separate function for the binary loss function, and then save it on the MATLAB® path. Or, you can specify an anonymous binary loss function. customBL = @(M,s)nanmedian(1 - bsxfun(@times,M,s),2)/2;

Predict cross-validation labels and estimate the median binary loss per class. Print the median negative binary losses per class for a random set of 10 out-of-fold observations. [label,NegLoss] = kfoldPredict(CVMdl,'BinaryLoss',customBL); idx = randsample(numel(label),10); table(Y(idx),label(idx),NegLoss(idx,1),NegLoss(idx,2),NegLoss(idx,3),... NegLoss(idx,4),'VariableNames',[{'True'};{'Predicted'};... categories(CVMdl.ClassNames)]) ans=10×6 table True ________ others simulink dsp others dsp others others others others simulink

Predicted _________

comm _________

dsp ________

simulink ________

others _______

others simulink dsp others dsp others others others others others

-1.2319 -16.407 -0.7387 -0.1251 2.5867 -0.025358 -2.6725 -1.1605 -1.9511 -7.848

-1.0488 -12.218 -0.11534 -0.8749 6.4187 -1.2287 -0.56708 -0.88321 -1.3175 -5.8203

0.048758 21.531 -0.88466 -0.99766 -3.5867 -0.97464 -0.51092 -0.11679 0.24735 4.8203

1.6175 11.218 -0.2613 0.14517 -4.4165 0.19747 2.7453 0.43504 0.95111 6.8457

The software predicts the label based on the maximum negated loss.

32-2862

kfoldPredict

Estimate Posterior Class Probabilities ECOC models composed of linear classification models return posterior probabilities for logistic regression learners only. This example requires the Parallel Computing Toolbox™ and the Optimization Toolbox™ Load the NLP data set and preprocess the data as in “Specify Custom Binary Loss” on page 32-2861. load nlpdata X = X'; Y(~(ismember(Y,{'simulink','dsp','comm'}))) = 'others';

Create a set of 5 logarithmically-spaced regularization strengths from

through

.

Lambda = logspace(-6,-0.5,5);

Create a linear classification model template that specifies optimizing the objective function using SpaRSA and to use logistic regression learners. t = templateLinear('Solver','sparsa','Learner','logistic','Lambda',Lambda);

Cross-validate an ECOC model of linear classification models using 5-fold cross-validation. Specify that the predictor observations correspond to columns, and to use parallel computing. rng(1); % For reproducibility Options = statset('UseParallel',true); CVMdl = fitcecoc(X,Y,'Learners',t,'KFold',5,'ObservationsIn','columns',... 'Options',Options); Starting parallel pool (parpool) using the 'local' profile ... connected to 6 workers.

Predict the cross-validated posterior class probabilities. Specify to use parallel computing and to estimate posterior probabilities using quadratic programming. [label,~,~,Posterior] = kfoldPredict(CVMdl,'Options',Options,... 'PosteriorMethod','qp'); size(label) label(3,4) size(Posterior) Posterior(3,:,4) ans = 31572

5

ans = categorical others ans =

32-2863

32

Functions

31572

4

5

ans = 0.0293

0.0373

0.1738

0.7596

Because there are five regularization strengths: • label is a 31572-by-5 categorical array. label(3,4) is the predicted, cross-validated label for observation 3 using the model trained with regularization strength Lambda(4). • Posterior is a 31572-by-4-by-5 matrix. Posterior(3,:,4) is the vector of all estimated, posterior class probabilities for observation 3 using the model trained with regularization strength Lambda(4). The order of the second dimension corresponds to CVMdl.ClassNames. Display a random set of 10 posterior class probabilities. Display a random sample of cross-validated labels and posterior probabilities for the model trained using Lambda(4). idx = randsample(size(label,1),10); table(Y(idx),label(idx,4),Posterior(idx,1,4),Posterior(idx,2,4),... Posterior(idx,3,4),Posterior(idx,4,4),... 'VariableNames',[{'True'};{'Predicted'};categories(CVMdl.ClassNames)]) ans = 10×6 table True ________

Predicted _________

comm __________

dsp __________

simulink ________

others _________

others simulink dsp others dsp others others others others simulink

others simulink others others dsp others others others others simulink

0.030309 3.5104e-05 0.15837 0.093212 0.0057401 0.085715 0.0061121 0.066741 0.05236 0.00039812

0.022454 4.3154e-05 0.25784 0.063752 0.89678 0.054451 0.0057884 0.074103 0.025631 0.00045575

0.10401 0.99877 0.18567 0.12927 0.014939 0.083765 0.02409 0.168 0.13245 0.73724

0.84323 0.0011543 0.39811 0.71376 0.082538 0.77607 0.96401 0.69115 0.78956 0.2619

More About Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). 32-2864

kfoldPredict

• sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole.

Algorithms The software can estimate class posterior probabilities by minimizing the Kullback-Leibler divergence or by using quadratic programming. For the following descriptions of the posterior estimation algorithms, assume that: 32-2865

32

Functions

• mkj is the element (k,j) of the coding design matrix M. • I is the indicator function. • p k is the class posterior probability estimate for class k of an observation, k = 1,...,K. • rj is the positive-class posterior probability for binary learner j. That is, rj is the probability that binary learner j classifies an observation into the positive class, given the training data. Posterior Estimation Using Kullback-Leibler Divergence By default, the software minimizes the Kullback-Leibler divergence to estimate class posterior probabilities. The Kullback-Leibler divergence between the expected and observed positive-class posterior probabilities is Δ(r, r ) =

L



j=1

where w j =

w j r jlog

rj 1 − rj + 1 − r j log , r j 1−r j

∑ wi∗ is the weight for binary learner j. Sj

• Sj is the set of observation indices on which binary learner j is trained. • wi∗ is the weight of observation i. The software minimizes the divergence iteratively. The first step is to choose initial values (0)

p k ; k = 1, ..., K for the class posterior probabilities. • If you do not specify 'NumKLIterations', then the software tries both sets of deterministic initial values described next, and selects the set that minimizes Δ. • p (0) = 1/K; k = 1, ..., K . k • p (0); k = 1, ..., K is the solution of the system k M01p

(0)

= r,

where M01 is M with all mkj = –1 replaced with 0, and r is a vector of positive-class posterior probabilities returned by the L binary learners [Dietterich et al.] on page 18-189. The software uses lsqnonneg to solve the system. • If you specify 'NumKLIterations',c, where c is a natural number, then the software does the (0)

following to choose the set p k ; k = 1, ..., K, and selects the set that minimizes Δ. • The software tries both sets of deterministic initial values as described previously. • The software randomly generates c vectors of length K using rand, and then normalizes each vector to sum to 1. At iteration t, the software completes these steps: 1

Compute K

r

(t) j



=

k=1 K



k=1

32-2866

(t)

p k I(mk j = + 1)

(t) p k I(mk j

. = + 1 ∪ mk j = − 1)

kfoldPredict

2

Estimate the next class posterior probability using L

(t + 1) pk

=

(t) pk



j=1 L



j=1

w j r jI mk j = + 1 + 1 − r j I mk j = − 1

wj r

(t) j I

3

Normalize p k

4

Check for convergence.

mk j = + 1 + 1 − r

(t) j

. I mk j = − 1

(t + 1)

; k = 1, ..., K so that they sum to 1.

For more details, see [Hastie et al.] on page 18-190 and [Zadrozny] on page 18-191. Posterior Estimation Using Quadratic Programming Posterior probability estimation using quadratic programming requires an Optimization Toolbox license. To estimate posterior probabilities for an observation using this method, the software completes these steps: 1

Estimate the positive-class posterior probabilities, rj, for binary learners j = 1,...,L.

2

Using the relationship between rj and p k [Wu et al.] on page 18-191, minimize L



j=1

−r j

K



k=1

p kI mk j = − 1 + 1 − r j

K



k=1

2

p kI mk j = + 1

with respect to p k and the restrictions 0 ≤ pk ≤ 1

∑ p k = 1. k

The software performs minimization using quadprog.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Dietterich, T., and G. Bakiri. “Solving Multiclass Learning Problems Via Error-Correcting Output Codes.” Journal of Artificial Intelligence Research. Vol. 2, 1995, pp. 263–286. [3] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. [4] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297. [5] Hastie, T., and R. Tibshirani. “Classification by Pairwise Coupling.” Annals of Statistics. Vol. 26, Issue 2, 1998, pp. 451–471. 32-2867

32

Functions

[6] Wu, T. F., C. J. Lin, and R. Weng. “Probability Estimates for Multi-Class Classification by Pairwise Coupling.” Journal of Machine Learning Research. Vol. 5, 2004, pp. 975–1005. [7] Zadrozny, B. “Reducing Multiclass to Binary by Coupling Probability Estimates.” NIPS 2001: Proceedings of Advances in Neural Information Processing Systems 14, 2001, pp. 1041–1048.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationECOC | ClassificationLinear | ClassificationPartitionedLinearECOC | confusionchart | fitcecoc | perfcurve | predict | statset | testcholdout Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2016a

32-2868

kfoldPredict

kfoldPredict Predict response for observations not used for training

Syntax label = kfoldPredict(obj) [label,score] = kfoldPredict(obj) [label,score,cost] = kfoldPredict(obj)

Description label = kfoldPredict(obj) returns class labels predicted by obj, a cross-validated classification. For every fold, kfoldPredict predicts class labels for in-fold observations using a model trained on out-of-fold observations. [label,score] = kfoldPredict(obj) returns the predicted classification scores for in-fold observations using a model trained on out-of-fold observations. [label,score,cost] = kfoldPredict(obj) returns misclassification costs.

Input Arguments obj Object of class ClassificationPartitionedModel or ClassificationPartitionedEnsemble.

Output Arguments label Vector of class labels of the same type as the response data used in training obj. (The software treats string arrays as cell arrays of character vectors.) Each entry of label corresponds to a predicted class label for the corresponding row of X. score Numeric matrix of size N-by-K, where N is the number of observations (rows) in obj.X, and K is the number of classes (in obj.ClassNames). score(i,j) represents the confidence that row i of obj.X is of class j. For details, see “More About” on page 32-2871. cost Numeric matrix of misclassification costs of size N-by-K. cost(i,j) is the average misclassification cost of predicting that row i of obj.X is of class j.

Examples

32-2869

32

Functions

Estimate Cross-Validation Predictions from an Ensemble Find the cross-validation predictions for a model based on Fisher's iris data. Load Fisher's iris data set. load fisheriris

Train an ensemble of classification trees using AdaBoostM2. Specify tree stumps as the weak learners. rng(1); % For reproducibility t = templateTree('MaxNumSplits',1); Mdl = fitcensemble(meas,species,'Method','AdaBoostM2','Learners',t);

Cross validate the trained ensemble using 10-fold cross validation. CVMdl = crossval(Mdl);

Estimate cross-validation predicted labels and scores. [elabel,escore] = kfoldPredict(CVMdl);

Display the maximum and minimum scores of each class. max(escore) ans = 1×3 9.3862

8.9871

10.1866

3.8359

0.8981

min(escore) ans = 1×3 0.0017

Create Confusion Matrix Using Cross-Validation Predictions Create a confusion matrix using the 10-fold cross-validation predictions of a discriminant analysis model. Load the fisheriris data set. X contains flower measurements for 150 different flowers, and y lists the species, or class, for each flower. Create a variable order that specifies the order of the classes. load fisheriris X = meas; y = species; order = unique(y) order = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }

32-2870

kfoldPredict

Create a 10-fold cross-validated discriminant analysis model by using the fitcdiscr function. By default, fitcdiscr ensures that training and test sets have roughly the same proportions of flower species. Specify the order of the flower classes. cvmdl = fitcdiscr(X,y,'KFold',10,'ClassNames',order);

Predict the species of the test set flowers. predictedSpecies = kfoldPredict(cvmdl);

Create a confusion matrix that compares the true class values to the predicted class values. confusionchart(y,predictedSpecies)

More About Cost (discriminant analysis) The average misclassification cost is the mean misclassification cost for predictions made by the cross-validated classifiers trained on out-of-fold observations. The matrix of expected costs per observation is defined in “Cost” on page 20-7.

32-2871

32

Functions

Score For discriminant analysis, the score of a classification is the posterior probability of the classification. For the definition of posterior probability in discriminant analysis, see “Posterior Probability” on page 20-6. For ensembles, a classification score represents the confidence of a classification into a class. The higher the score, the higher the confidence. Different ensemble algorithms have different definitions for their scores. Furthermore, the range of scores depends on ensemble type. For example: • AdaBoostM1 scores range from –∞ to ∞. • Bag scores range from 0 to 1. For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node. For example, consider classifying a predictor X as true when X < 0.15 or X > 0.95, and X is false otherwise. Generate 100 random points and classify them: rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')

32-2872

kfoldPredict

Prune the tree: tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')

32-2873

32

Functions

The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations from .15 to .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false. Compute the prediction scores for the first 10 rows of X: [~,score] = predict(tree1,X(1:10)); [score X(1:10,:)] ans = 10×3 0.9059 0.9059 0 0.9059 0.9059 0 0.9059 0.9059 0.9059

32-2874

0.0941 0.0941 1.0000 0.0941 0.0941 1.0000 0.0941 0.0941 0.0941

0.8147 0.9058 0.1270 0.9134 0.6324 0.0975 0.2785 0.5469 0.9575

kfoldPredict

0.9059

0.0941

0.9649

Indeed, every value of X (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.91 and 0.09. The difference (score 0.09 instead of the expected .06) is due to a statistical fluctuation: there are 8 observations in X in the range (.95,1) instead of the expected 5 observations.

See Also ClassificationPartitionedModel | crossval | kfoldEdge | kfoldLoss | kfoldMargin | kfoldfun

32-2875

32

Functions

kfoldPredict Predict responses for observations not used for training

Syntax YHat = kfoldPredict(CVMdl)

Description YHat = kfoldPredict(CVMdl) returns cross-validated predicted responses by the cross-validated linear regression model CVMdl. That is, for every fold, kfoldPredict predicts responses for observations that it holds out when it trains using all other observations. YHat contains predicted responses for each regularization strength in the linear regression models that compose CVMdl.

Input Arguments CVMdl — Cross-validated, linear regression model RegressionPartitionedLinear model object Cross-validated, linear regression model, specified as a RegressionPartitionedLinear model object. You can create a RegressionPartitionedLinear model using fitrlinear and specifying any of the one of the cross-validation, name-value pair arguments, for example, CrossVal. To obtain estimates, kfoldPredict applies the same data used to cross-validate the linear regression model (X and Y).

Output Arguments YHat — Cross-validated predicted responses numeric array Cross-validated predicted responses, returned as an n-by-L numeric array. n is the number of observations in the predictor data that created CVMdl (see X) and L is the number of regularization strengths in CVMdl.Trained{1}.Lambda. YHat(i,j) is the predicted response for observation i using the linear regression model that has regularization strength CVMdl.Trained{1}.Lambda(j). The predicted response using the model with regularization strength j is y j = xβ j + b j . • x is an observation from the predictor data matrix X, and is row vector. • β j is the estimated column vector of coefficients. The software stores this vector in Mdl.Beta(:,j). • b j is the estimated, scalar bias, which the software stores in Mdl.Bias(j).

Examples

32-2876

kfoldPredict

Predict Cross-Validated Responses Simulate 10000 observations from this model y = x100 + 2x200 + e . •

X = x1, . . . , x1000 is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

• e is random normal error with mean 0 and standard deviation 0.3. rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Cross-validate a linear regression model. CVMdl = fitrlinear(X,Y,'CrossVal','on') CVMdl = classreg.learning.partition.RegressionPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 10000 KFold: 10 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods Mdl1 = CVMdl.Trained{1} Mdl1 = RegressionLinear ResponseName: ResponseTransform: Beta: Bias: Lambda: Learner:

'Y' 'none' [1000x1 double] 0.0107 1.1111e-04 'svm'

Properties, Methods

By default, fitrlinear implements 10-fold cross-validation. CVMdl is a RegressionPartitionedLinear model. It contains the property Trained, which is a 10-by-1 cell array holding 10 RegressionLinear models that the software trained using the training set. Predict responses for observations that fitrlinear did not use in training the folds. yHat = kfoldPredict(CVMdl);

Because there is one regularization strength in Mdl, yHat is a numeric vector.

32-2877

32

Functions

Predict for Models Containing Several Regularization Strengths Simulate 10000 observations as in “Predict Cross-Validated Responses” on page 32-2876. rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1); −5

Create a set of 15 logarithmically-spaced regularization strengths from 10

−1

through 10

.

Lambda = logspace(-5,-1,15);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Specify using least squares with a lasso penalty and optimizing the objective function using SpaRSA. X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso');

CVMdl is a RegressionPartitionedLinear model. Its Trained property contains a 5-by-1 cell array of trained RegressionLinear models, each one holds out a different fold during training. Because fitrlinear trained using 15 regularization strengths, you can think of each RegressionLinear model as 15 models. Predict cross-validated responses. YHat = kfoldPredict(CVMdl); size(YHat) ans = 1×2 10000

15

YHat(2,:) ans = 1×15 -1.7338

-1.7332

-1.7319

-1.7299

-1.7266

-1.7239

-1.7135

-1.7210

-1.7324

YHat is a 10000-by-15 matrix. YHat(2,:) is the cross-validated response for observation 2 using the model regularized with all 15 regularization values.

See Also RegressionLinear | RegressionPartitionedLinear | fitrlinear | predict Introduced in R2016a 32-2878

-1.7

kfoldPredict

kfoldPredict Predict response for observations not used for training

Syntax yfit = kfoldPredict(obj)

Description yfit = kfoldPredict(obj) returns the predicted values for the responses of the training data based on obj, an object trained on out-of-fold observations.

Input Arguments obj Object of class RegressionPartitionedModel. Create obj with fitrtree or fitrensemble along with one of the cross-validation options: 'crossval', 'kfold', 'holdout', 'leaveout', or 'cvpartition'. Alternatively, create obj from a regression tree or regression ensemble with crossval.

Output Arguments yfit A vector of predicted values for the response data based on a model trained on out-of-fold observations.

Examples Construct a partitioned regression model, and examine the cross-validation loss. The cross-validation loss is the mean squared error between yfit and the true response data: load carsmall XX = [Cylinders Displacement Horsepower Weight]; YY = MPG; tree = fitrtree(XX,YY); cvmodel = crossval(tree); L = kfoldLoss(cvmodel) L = 26.5271 yfit = kfoldPredict(cvmodel); mean( (yfit - tree.Y).^2 ) ans = 26.5271

32-2879

32

Functions

See Also fitrtree | kfoldLoss

32-2880

kmeans

kmeans k-means clustering

Syntax idx = kmeans(X,k) idx = kmeans(X,k,Name,Value) [idx,C] = kmeans( ___ ) [idx,C,sumd] = kmeans( ___ ) [idx,C,sumd,D] = kmeans( ___ )

Description idx = kmeans(X,k) performs k-means clustering on page 32-2895 to partition the observations of the n-by-p data matrix X into k clusters, and returns an n-by-1 vector (idx) containing cluster indices of each observation. Rows of X correspond to points and columns correspond to variables. By default, kmeans uses the squared Euclidean distance metric and the k-means++ algorithm on page 32-2895 for cluster center initialization. idx = kmeans(X,k,Name,Value) returns the cluster indices with additional options specified by one or more Name,Value pair arguments. For example, specify the cosine distance, the number of times to repeat the clustering using new initial values, or to use parallel computing. [idx,C] = kmeans( ___ ) returns the k cluster centroid locations in the k-by-p matrix C. [idx,C,sumd] = kmeans( ___ ) returns the within-cluster sums of point-to-centroid distances in the k-by-1 vector sumd. [idx,C,sumd,D] = kmeans( ___ ) returns distances from each point to every centroid in the n-byk matrix D.

Examples Train a k-Means Clustering Algorithm Cluster data using k-means clustering, then plot the cluster regions. Load Fisher's iris data set. Use the petal lengths and widths as predictors. load fisheriris X = meas(:,3:4); figure; plot(X(:,1),X(:,2),'k*','MarkerSize',5); title 'Fisher''s Iris Data'; xlabel 'Petal Lengths (cm)'; ylabel 'Petal Widths (cm)';

32-2881

32

Functions

The larger cluster seems to be split into a lower variance region and a higher variance region. This might indicate that the larger cluster is two, overlapping clusters. Cluster the data. Specify k = 3 clusters. rng(1); % For reproducibility [idx,C] = kmeans(X,3);

kmeans uses the k-means++ algorithm for centroid initialization and squared Euclidean distance by default. It is good practice to search for lower, local minima by setting the 'Replicates' namevalue pair argument. idx is a vector of predicted cluster indices corresponding to the observations in X. C is a 3-by-2 matrix containing the final centroid locations. Use kmeans to compute the distance from each centroid to points on a grid. To do this, pass the centroids (C) and points on a grid to kmeans, and implement one iteration of the algorithm. x1 = min(X(:,1)):0.01:max(X(:,1)); x2 = min(X(:,2)):0.01:max(X(:,2)); [x1G,x2G] = meshgrid(x1,x2); XGrid = [x1G(:),x2G(:)]; % Defines a fine grid on the plot idx2Region = kmeans(XGrid,3,'MaxIter',1,'Start',C); Warning: Failed to converge in 1 iterations. % Assigns each node in the grid to the closest centroid

32-2882

kmeans

kmeans displays a warning stating that the algorithm did not converge, which you should expect since the software only implemented one iteration. Plot the cluster regions. figure; gscatter(XGrid(:,1),XGrid(:,2),idx2Region,... [0,0.75,0.75;0.75,0,0.75;0.75,0.75,0],'..'); hold on; plot(X(:,1),X(:,2),'k*','MarkerSize',5); title 'Fisher''s Iris Data'; xlabel 'Petal Lengths (cm)'; ylabel 'Petal Widths (cm)'; legend('Region 1','Region 2','Region 3','Data','Location','SouthEast'); hold off;

Partition Data into Two Clusters Randomly generate the sample data. rng default; % For reproducibility X = [randn(100,2)*0.75+ones(100,2); randn(100,2)*0.5-ones(100,2)]; figure;

32-2883

32

Functions

plot(X(:,1),X(:,2),'.'); title 'Randomly Generated Data';

There appears to be two clusters in the data. Partition the data into two clusters, and choose the best arrangement out of five initializations. Display the final output. opts = statset('Display','final'); [idx,C] = kmeans(X,2,'Distance','cityblock',... 'Replicates',5,'Options',opts); Replicate 1, 3 Replicate 2, 5 Replicate 3, 3 Replicate 4, 3 Replicate 5, 2 Best total sum

iterations, total sum of iterations, total sum of iterations, total sum of iterations, total sum of iterations, total sum of of distances = 201.533

distances distances distances distances distances

= = = = =

201.533. 201.533. 201.533. 201.533. 201.533.

By default, the software initializes the replicates separately using k-means++. Plot the clusters and the cluster centroids. figure; plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',12) hold on plot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',12) plot(C(:,1),C(:,2),'kx',...

32-2884

kmeans

'MarkerSize',15,'LineWidth',3) legend('Cluster 1','Cluster 2','Centroids',... 'Location','NW') title 'Cluster Assignments and Centroids' hold off

You can determine how well separated the clusters are by passing idx to silhouette.

Cluster Data Using Parallel Computing Clustering large data sets might take time, particularly if you use online updates (set by default). If you have a Parallel Computing Toolbox ™ license and you set the options for parallel computing, then kmeans runs each clustering task (or replicate) in parallel. And, if Replicates>1, then parallel computing decreases time to convergence. Randomly generate a large data set from a Gaussian mixture model. Mu = bsxfun(@times,ones(20,30),(1:20)'); % Gaussian mixture mean rn30 = randn(30,30); Sigma = rn30'*rn30; % Symmetric and positive-definite covariance Mdl = gmdistribution(Mu,Sigma); % Define the Gaussian mixture distribution rng(1); % For reproducibility X = random(Mdl,10000);

32-2885

32

Functions

Mdl is a 30-dimensional gmdistribution model with 20 components. X is a 10000-by-30 matrix of data generated from Mdl. Specify the options for parallel computing. stream = RandStream('mlfg6331_64'); % Random number stream options = statset('UseParallel',1,'UseSubstreams',1,... 'Streams',stream);

The input argument 'mlfg6331_64' of RandStream specifies to use the multiplicative lagged Fibonacci generator algorithm. options is a structure array with fields that specify options for controlling estimation. Cluster the data using k-means clustering. Specify that there are k = 20 clusters in the data and increase the number of iterations. Typically, the objective function contains local minima. Specify 10 replicates to help find a lower, local minimum. tic; % Start stopwatch timer [idx,C,sumd,D] = kmeans(X,20,'Options',options,'MaxIter',10000,... 'Display','final','Replicates',10); Starting parallel pool (parpool) using the 'local' profile ... connected to 6 workers. Replicate 5, 72 iterations, total sum of distances = 7.73161e+06. Replicate 1, 64 iterations, total sum of distances = 7.72988e+06. Replicate 3, 68 iterations, total sum of distances = 7.72576e+06. Replicate 4, 84 iterations, total sum of distances = 7.72696e+06. Replicate 6, 82 iterations, total sum of distances = 7.73006e+06. Replicate 7, 40 iterations, total sum of distances = 7.73451e+06. Replicate 2, 194 iterations, total sum of distances = 7.72953e+06. Replicate 9, 105 iterations, total sum of distances = 7.72064e+06. Replicate 10, 125 iterations, total sum of distances = 7.72816e+06. Replicate 8, 70 iterations, total sum of distances = 7.73188e+06. Best total sum of distances = 7.72064e+06 toc % Terminate stopwatch timer Elapsed time is 61.915955 seconds.

The Command Window indicates that six workers are available. The number of workers might vary on your system. The Command Window displays the number of iterations and the terminal objective function value for each replicate. The output arguments contain the results of replicate 9 because it has the lowest total sum of distances.

Assign New Data to Existing Clusters and Generate C/C++ Code kmeans performs k-means clustering to partition data into k clusters. When you have a new data set to cluster, you can create new clusters that include the existing data and the new data by using kmeans. The kmeans function supports C/C++ code generation, so you can generate code that accepts training data and returns clustering results, and then deploy the code to a device. In this workflow, you must pass training data, which can be of considerable size. To save memory on the device, you can separate training and prediction by using kmeans and pdist2, respectively. Use kmeans to create clusters in MATLAB® and use pdist2 in the generated code to assign new data to existing clusters. For code generation, define an entry-point function that accepts the cluster 32-2886

kmeans

centroid positions and the new data set, and returns the index of the nearest cluster. Then, generate code for the entry-point function. Generating C/C++ code requires MATLAB® Coder™. Perform k-Means Clustering Generate a training data set using three distributions. rng('default') % For reproducibility X = [randn(100,2)*0.75+ones(100,2); randn(100,2)*0.5-ones(100,2); randn(100,2)*0.75];

Partition the training data into three clusters by using kmeans. [idx,C] = kmeans(X,3);

Plot the clusters and the cluster centroids. figure gscatter(X(:,1),X(:,2),idx,'bgm') hold on plot(C(:,1),C(:,2),'kx') legend('Cluster 1','Cluster 2','Cluster 3','Cluster Centroid')

32-2887

32

Functions

Assign New Data to Existing Clusters Generate a test data set. Xtest = [randn(10,2)*0.75+ones(10,2); randn(10,2)*0.5-ones(10,2); randn(10,2)*0.75];

Classify the test data set using the existing clusters. Find the nearest centroid from each test data point by using pdist2. [~,idx_test] = pdist2(C,Xtest,'euclidean','Smallest',1);

Plot the test data and label the test data using idx_test by using gscatter. gscatter(Xtest(:,1),Xtest(:,2),idx_test,'bgm','ooo') legend('Cluster 1','Cluster 2','Cluster 3','Cluster Centroid', ... 'Data classified to Cluster 1','Data classified to Cluster 2', ... 'Data classified to Cluster 3')

Generate Code Generate C code that assigns new data to the existing clusters. Note that generating C/C++ code requires MATLAB® Coder™. Define an entry-point function named findNearestCentroid that accepts centroid positions and new data, and then find the nearest cluster by using pdist2. 32-2888

kmeans

Add the %#codegen compiler directive (or pragma) to the entry-point function after the function signature to indicate that you intend to generate code for the MATLAB algorithm. Adding this directive instructs the MATLAB Code Analyzer to help you diagnose and fix violations that would cause errors during code generation. type findNearestCentroid % Display contents of findNearestCentroid.m function idx = findNearestCentroid(C,X) %#codegen [~,idx] = pdist2(C,X,'euclidean','Smallest',1); % Find the nearest centroid

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB®, then MATLAB® opens the example folder. This folder includes the entry-point function file. Generate code by using codegen. Because C and C++ are statically typed languages, you must determine the properties of all variables in the entry-point function at compile time. To specify the data type and array size of the inputs of findNearestCentroid, pass a MATLAB expression that represents the set of values with a certain data type and array size by using the -args option. For details, see “Specify Variable-Size Arguments for Code Generation” on page 31-38. codegen findNearestCentroid -args {C,Xtest}

codegen generates the MEX function findNearestCentroid_mex with a platform-dependent extension. Verify the generated code. myIndx = findNearestCentroid(C,Xtest); myIndex_mex = findNearestCentroid_mex(C,Xtest); verifyMEX = isequal(idx_test,myIndx,myIndex_mex) verifyMEX = logical 1

isequal returns logical 1 (true), which means all the inputs are equal. The comparison confirms that the pdist2 function, the findNearestCentroid function, and the MEX function return the same index. You can also generate optimized CUDA® code using GPU Coder™. cfg = coder.gpuConfig('mex'); codegen -config cfg findNearestCentroid -args {C,Xtest}

For more information on code generation, see “General Code Generation Workflow” on page 31-5. For more information on GPU coder, see “Get Started with GPU Coder” (GPU Coder) and “Supported Functions” (GPU Coder).

Input Arguments X — Data numeric matrix Data, specified as a numeric matrix. The rows of X correspond to observations, and the columns correspond to variables. 32-2889

32

Functions

If X is a numeric vector, then kmeans treats it as an n-by-1 data matrix, regardless of its orientation. The software treats NaNs in X as missing data and removes any row of X that contains at least one NaN. Removing rows of X reduces the sample size. The kmeans function returns NaN for the corresponding value in the output argument idx. Data Types: single | double k — Number of clusters positive integer Number of clusters in the data, specified as a positive integer. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Distance','cosine','Replicates',10,'Options',statset('UseParallel',1) specifies the cosine distance, 10 replicate clusters at different starting values, and to use parallel computing. Display — Level of output to display 'off' (default) | 'final' | 'iter' Level of output to display in the Command Window, specified as the comma-separated pair consisting of 'Display' and one of the following options: • 'final' — Displays results of the final iteration • 'iter' — Displays results of each iteration • 'off' — Displays nothing Example: 'Display','final' Distance — Distance metric 'sqeuclidean' (default) | 'cityblock' | 'cosine' | 'correlation' | 'hamming' Distance metric, in p-dimensional space, used for minimization, specified as the comma-separated pair consisting of 'Distance' and 'sqeuclidean', 'cityblock', 'cosine', 'correlation', or 'hamming'. kmeans computes centroid clusters differently for the supported distance metrics. This table summarizes the available distance metrics. In the formulae, x is an observation (that is, a row of X) and c is a centroid (a row vector).

32-2890

Distance Metric

Description

'sqeuclidean'

Squared Euclidean distance (default). Each centroid is the mean of the points in that cluster.

Formula d(x, c) = (x − c)(x − c)′

kmeans

Distance Metric

Description

Formula

'cityblock'

Sum of absolute differences, i.e., the L1 distance. Each centroid is the component-wise median of the points in that cluster.

'cosine'

One minus the cosine of the included angle between points (treated as vectors). Each centroid is the mean of the points in that cluster, after normalizing those points to unit Euclidean length.

'correlation'

One minus the sample correlation between points (treated as sequences of values). Each centroid is the component-wise mean of the points in that cluster, after centering and normalizing those points to zero mean and unit standard deviation.

xc′ xx′ cc′

d(x, c) = 1 x− x c− c ′



x− x x− x ′ c− c c− c ′ where



It is the proportion of bits that differ. Each centroid is the component-wise median of points in that cluster.

xj − cj

d(x, c) = 1 −



This metric is only suitable for binary data.



j=1



'hamming'

p

d(x, c) =

x = c =

p

1 p

j=1

1 p



∑ p

j=1

xj 1p cj 1p

1 p is a row vector of p ones. p

1 d(x, y) = p



I xj ≠ yj ,

j=1

where I is the indicator function.

Example: 'Distance','cityblock' EmptyAction — Action to take if cluster loses all member observations 'singleton' (default) | 'error' | 'drop' Action to take if a cluster loses all its member observations, specified as the comma-separated pair consisting of 'EmptyAction' and one of the following options. Value

Description

'error'

Treat an empty cluster as an error.

32-2891

,

32

Functions

Value

Description

'drop'

Remove any clusters that become empty. kmeans sets the corresponding return values in C and D to NaN.

'singleton'

Create a new cluster consisting of the one point furthest from its centroid (default).

Example: 'EmptyAction','error' MaxIter — Maximum number of iterations 100 (default) | positive integer Maximum number of iterations, specified as the comma-separated pair consisting of 'MaxIter' and a positive integer. Example: 'MaxIter',1000 Data Types: double | single OnlinePhase — Online update flag 'off' (default) | 'on' Online update flag, specified as the comma-separated pair consisting of 'OnlinePhase' and 'off' or 'on'. If OnlinePhase is on, then kmeans performs an online update phase in addition to a batch update phase. The online phase can be time consuming for large data sets, but guarantees a solution that is a local minimum of the distance criterion. In other words, the software finds a partition of the data in which moving any single point to a different cluster increases the total sum of distances. Example: 'OnlinePhase','on' Options — Options for controlling iterative algorithm for minimizing fitting criteria [] (default) | structure array returned by statset Options for controlling the iterative algorithm for minimizing the fitting criteria, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. Supported fields of the structure array specify options for controlling the iterative algorithm. This table summarizes the supported fields. Note that the supported fields require Parallel Computing Toolbox.

32-2892

kmeans

Field

Description

'Streams'

A RandStream object or cell array of such objects. If you do not specify Streams, kmeans uses the default stream or streams. If you specify Streams, use a single object except when all of the following conditions exist: • You have an open parallel pool. • UseParallel is true. • UseSubstreams is false. In this case, use a cell array the same size as the parallel pool. If a parallel pool is not open, then Streams must supply a single random number stream.

'UseParallel'

• If true and Replicates > 1, then kmeans implements the k-means algorithm on each replicate in parallel. • If Parallel Computing Toolbox is not installed, then computation occurs in serial mode. The default is false, indicating serial computation.

'UseSubstreams'

Set to true to compute in parallel in a reproducible fashion. The default is false. To compute reproducibly, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'.

To ensure more predictable results, use parpool and explicitly create a parallel pool before invoking kmeans and setting 'Options',statset('UseParallel',1). Example: 'Options',statset('UseParallel',1) Data Types: struct Replicates — Number of times to repeat clustering using new initial cluster centroid positions 1 (default) | positive integer Number of times to repeat clustering using new initial cluster centroid positions, specified as the comma-separated pair consisting of 'Replicates' and an integer. kmeans returns the solution with the lowest sumd. You can set 'Replicates' implicitly by supplying a 3-D array as the value for the 'Start' namevalue pair argument. Example: 'Replicates',5 Data Types: double | single Start — Method for choosing initial cluster centroid positions 'plus' (default) | 'cluster' | 'sample' | 'uniform' | numeric matrix | numeric array

32-2893

32

Functions

Method for choosing initial cluster centroid positions (or seeds), specified as the comma-separated pair consisting of 'Start' and 'cluster', 'plus', 'sample', 'uniform', a numeric matrix, or a numeric array. This table summarizes the available options for choosing seeds. Value

Description

'cluster'

Perform a preliminary clustering phase on a random 10% subsample of X when the number of observations in the subsample is greater than k. This preliminary phase is itself initialized using 'sample'. If the number of observations in the random 10% subsample is less than k, then the software selects k observations from X at random.

'plus' (default)

Select k seeds by implementing the k-means++ algorithm on page 32-2895 for cluster center initialization.

'sample'

Select k observations from X at random.

'uniform'

Select k points uniformly at random from the range of X. Not valid with the Hamming distance.

numeric matrix

k-by-p matrix of centroid starting locations. The rows of Start correspond to seeds. The software infers k from the first dimension of Start, so you can pass in [] for k.

numeric array

k-by-p-by-r array of centroid starting locations. The rows of each page correspond to seeds. The third dimension invokes replication of the clustering routine. Page j contains the set of seeds for replicate j. The software infers the number of replicates (specified by the 'Replicates' name-value pair argument) from the size of the third dimension.

Example: 'Start','sample' Data Types: char | string | double | single

Output Arguments idx — Cluster indices numeric column vector Cluster indices, returned as a numeric column vector. idx has as many rows as X, and each row indicates the cluster assignment of the corresponding observation. C — Cluster centroid locations numeric matrix Cluster centroid locations, returned as a numeric matrix. C is a k-by-p matrix, where row j is the centroid of cluster j. 32-2894

kmeans

sumd — Within-cluster sums of point-to-centroid distances numeric column vector Within-cluster sums of point-to-centroid distances, returned as a numeric column vector. sumd is a kby-1 vector, where element j is the sum of point-to-centroid distances within cluster j. D — Distances from each point to every centroid numeric matrix Distances from each point to every centroid, returned as a numeric matrix. D is an n-by-k matrix, where element (j,m) is the distance from observation j to centroid m.

More About k-Means Clustering k-means clustering, or Lloyd’s algorithm [2], is an iterative, data-partitioning algorithm that assigns n observations to exactly one of k clusters defined by centroids, where k is chosen before the algorithm starts. The algorithm proceeds as follows: 1

Choose k initial cluster centers (centroid). For example, choose k observations at random (by using 'Start','sample') or use the k-means ++ algorithm on page 32-2895 for cluster center initialization (the default).

2

Compute point-to-cluster-centroid distances of all observations to each centroid.

3

There are two ways to proceed (specified by OnlinePhase): • Batch update — Assign each observation to the cluster with the closest centroid. • Online update — Individually assign observations to a different centroid if the reassignment decreases the sum of the within-cluster, sum-of-squares point-to-cluster-centroid distances. For more details, see “Algorithms” on page 32-2896.

4

Compute the average of the observations in each cluster to obtain k new centroid locations.

5

Repeat steps 2 through 4 until cluster assignments do not change, or the maximum number of iterations is reached.

k-means++ Algorithm The k-means++ algorithm uses an heuristic to find centroid seeds for k-means clustering. According to Arthur and Vassilvitskii [1], k-means++ improves the running time of Lloyd’s algorithm, and the quality of the final solution. The k-means++ algorithm chooses seeds as follows, assuming the number of clusters is k. 1

Select an observation uniformly at random from the data set, X. The chosen observation is the first centroid, and is denoted c1.

2

Compute distances from each observation to c1. Denote the distance between cj and the observation m as d xm, c j .

3

Select the next centroid, c2 at random from X with probability 32-2895

32

Functions

2

d xm, c1



.

n

4

2

j=1

d x j, c1

To choose center j: a

Compute the distances from each observation to each centroid, and assign each observation to its closest centroid.

b

For m = 1,...,n and p = 1,...,j – 1, select centroid j at random from X with probability 2



d xm, cp h; xh ∈ Cp

, 2

d xh, cp

where Cp is the set of all observations closest to centroid cp and xm belongs to Cp. That is, select each subsequent center with a probability proportional to the distance from itself to the closest center that you already chose. 5

Repeat step 4 until k centroids are chosen.

Arthur and Vassilvitskii [1] demonstrate, using a simulation study for several cluster orientations, that k-means++ achieves faster convergence to a lower sum of within-cluster, sum-of-squares point-tocluster-centroid distances than Lloyd’s algorithm.

Algorithms • kmeans uses a two-phase iterative algorithm to minimize the sum of point-to-centroid distances, summed over all k clusters. 1

This first phase uses batch updates, where each iteration consists of reassigning points to their nearest cluster centroid, all at once, followed by recalculation of cluster centroids. This phase occasionally does not converge to solution that is a local minimum. That is, a partition of the data where moving any single point to a different cluster increases the total sum of distances. This is more likely for small data sets. The batch phase is fast, but potentially only approximates a solution as a starting point for the second phase.

2

This second phase uses online updates, where points are individually reassigned if doing so reduces the sum of distances, and cluster centroids are recomputed after each reassignment. Each iteration during this phase consists of one pass though all the points. This phase converges to a local minimum, although there might be other local minima with lower total sum of distances. In general, finding the global minimum is solved by an exhaustive choice of starting points, but using several replicates with random starting points typically results in a solution that is a global minimum.

• If Replicates = r > 1 and Start is plus (the default), then the software selects r possibly different sets of seeds according to the k-means++ algorithm on page 32-2895. • If you enable the UseParallel option in Options and Replicates > 1, then each worker selects seeds and clusters in parallel.

32-2896

kmeans

References [1] Arthur, David, and Sergi Vassilvitskii. “K-means++: The Advantages of Careful Seeding.” SODA ‘07: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. 2007, pp. 1027–1035. [2] Lloyd, Stuart P. “Least Squares Quantization in PCM.” IEEE Transactions on Information Theory. Vol. 28, 1982, pp. 129–137. [3] Seber, G. A. F. Multivariate Observations. Hoboken, NJ: John Wiley & Sons, Inc., 1984. [4] Spath, H. Cluster Dissection and Analysis: Theory, FORTRAN Programs, Examples. Translated by J. Goldschmidt. New York: Halsted Press, 1985.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. Usage notes and limitations: • Supported syntaxes are: • idx = kmeans(X,k) • [idx,C] = kmeans(X,k) • [idx,C,sumd] = kmeans(X,k) • [___] = kmeans(___,Name,Value) • Supported name-value pair arguments, and any differences, are: • 'Display' — Default value is 'iter'. • 'MaxIter' • 'Options' — Supports only the 'TolFun' field of the structure array created by statset. The default value of 'TolFun' is 1e-4. The kmeans function uses the value of 'TolFun' as the termination tolerance for the within-cluster sums of point-to-centroid distances. For example, you can specify 'Options',statset('TolFun',1e-8). • 'Replicates' • 'Start' — Supports only 'plus', 'sample', and a numeric array. For more information, see “Tall Arrays for Out-of-Memory Data” (MATLAB). C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • If the Start method uses random selections, the initial centroid cluster positions might not match MATLAB. • If the number of rows in X is fixed, code generation does not remove rows of X that contain a NaN. • The cluster centroid locations in C can have a different order than in MATLAB. In this case, the cluster indices in idx have corresponding differences. 32-2897

32

Functions

• If you provide Display, its value must be 'off'. • If you provide Streams, it must be empty and UseSubstreams must be false. • When you set the UseParallel option to true: • Some computations can execute in parallel even when Replicates is 1. For large data sets, when Replicates is 1, consider setting the UseParallel option to true. • kmeans uses parfor to create loops that run in parallel on supported shared-memory multicore platforms. Loops that run in parallel can be faster than loops that run on a single thread. If your compiler does not support the Open Multiprocessing (OpenMP) application interface or you disable OpenMP library, MATLAB Coder treats the parfor-loops as for-loops. To find supported compilers, see https://www.mathworks.com/support/compilers/ current_release/. • To save memory on the device to which you deploy generated code, you can separate training and prediction by using kmeans and pdist2, respectively. Use kmeans to create clusters in MATLAB and use pdist2 in the generated code to assign new data to existing clusters. For code generation, define an entry-point function that accepts the cluster centroid positions and the new data set, and returns the index of the nearest cluster. Then, generate code for the entry-point function. For an example, see “Assign New Data to Existing Clusters and Generate C/C++ Code” on page 32-2886. • Starting in R2020a, kmeans returns integer-type (int32) indices, rather than double-precision indices, in generated standalone C/C++ code. Therefore, the function allows for stricter singleprecision support when you use single-precision inputs. For MEX code generation, the function still returns double-precision indices to match the MATLAB behavior. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox). GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also clusterdata | gmdistribution | kmedoids | linkage | parpool | silhouette | statset 32-2898

kmeans

Topics “Compare k-Means Clustering Solutions” on page 16-33 “Introduction to k-Means Clustering” on page 16-33 Introduced before R2006a

32-2899

32

Functions

kmedoids k-medoids clustering

Syntax idx = kmedoids(X,k) idx = kmedoids(X,k,Name,Value) [idx,C] = kmedoids( ___ ) [idx,C,sumd] = kmedoids( ___ ) [idx,C,sumd,D] = kmedoids( ___ ) [idx,C,sumd,D,midx] = kmedoids( ___ ) [idx,C,sumd,D,midx,info] = kmedoids( ___ )

Description idx = kmedoids(X,k) performs “k-medoids Clustering” on page 32-2911 to partition the observations of the n-by-p matrix X into k clusters, and returns an n-by-1 vector idx containing cluster indices of each observation. Rows of X correspond to points and columns correspond to variables. By default, kmedoids uses squared Euclidean distance metric and the k-means++ algorithm on page 32-2895 for choosing initial cluster medoid positions. idx = kmedoids(X,k,Name,Value) uses additional options specified by one or more Name,Value pair arguments. [idx,C] = kmedoids( ___ ) returns the k cluster medoid locations in the k-by-p matrix C. [idx,C,sumd] = kmedoids( ___ ) returns the within-cluster sums of point-to-medoid distances in the k-by-1 vector sumd. [idx,C,sumd,D] = kmedoids( ___ ) returns distances from each point to every medoid in the nby-k matrix D. [idx,C,sumd,D,midx] = kmedoids( ___ ) returns the indices midx such that C = X(midx,:). midx is a k-by-1 vector. [idx,C,sumd,D,midx,info] = kmedoids( ___ ) returns a structure info with information about the options used by the algorithm when executed.

Examples Group Data into Two Clusters Randomly generate data. rng('default'); % For reproducibility X = [randn(100,2)*0.75+ones(100,2); randn(100,2)*0.55-ones(100,2)]; figure; plot(X(:,1),X(:,2),'.'); title('Randomly Generated Data');

32-2900

kmedoids

Group data into two clusters using kmedoids. Use the cityblock distance metric. opts = statset('Display','iter'); [idx,C,sumd,d,midx,info] = kmedoids(X,2,'Distance','cityblock','Options',opts); rep iter sum 1 1 209.856 1 2 209.856 Best total sum of distances = 209.856

info is a struct that contains information about how the algorithm was executed. For example, bestReplicate field indicates the replicate that was used to produce the final solution. In this example, the replicate number 1 was used since the default number of replicates is 1 for the default algorithm, which is pam in this case. info info = struct with algorithm: start: distance: iterations: bestReplicate:

fields: 'pam' 'plus' 'cityblock' 2 1

Plot the clusters and the cluster medoids. figure; plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',7)

32-2901

32

Functions

hold on plot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',7) plot(C(:,1),C(:,2),'co',... 'MarkerSize',7,'LineWidth',1.5) legend('Cluster 1','Cluster 2','Medoids',... 'Location','NW'); title('Cluster Assignments and Medoids'); hold off

Cluster Categorical Data Using k-Medoids This example uses "Mushroom" data set [3][4][5] [6][7] from the UCI machine learning archive [7], described in http://archive.ics.uci.edu/ml/datasets/Mushroom. The data set includes 22 predictors for 8,124 observations of various mushrooms. The predictors are categorical data types. For example, cap shape is categorized with features of 'b' for bell-shaped cap and 'c' for conical. Mushroom color is also categorized with features of 'n' for brown, and 'p' for pink. The data set also includes a classification for each mushroom of either edible or poisonous. Since the features of the mushroom data set are categorical, it is not possible to define the mean of several data points, and therefore the widely-used k-means clustering algorithm cannot be meaningfully applied to this data set. k-medoids is a related algorithm that partitions data into k distinct clusters, by finding medoids that minimize the sum of dissimilarities between points in the data and their nearest medoid. 32-2902

kmedoids

The medoid of a set is a member of that set whose average dissimilarity with the other members of the set is the smallest. Similarity can be defined for many types of data that do not allow a mean to be calculated, allowing k-medoids to be used for a broader range of problems than k-means. Using k-medoids, this example clusters the mushrooms into two groups, based on the predictors provided. It then explores the relationship between those clusters and the classifications of the mushrooms as either edible or poisonous. This example assumes that you have downloaded the "Mushroom" data set [3][4][5] [6][7] from the UCI database (http://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/) and saved it in your current directory as a text file named agaricus-lepiota.txt. There is no column headers in the data, so readtable uses the default variable names. clear all data = readtable('agaricus-lepiota.txt','ReadVariableNames',false);

Display the first 5 mushrooms with their first few features. data(1:5,1:10) ans = Var1 ____

Var2 ____

Var3 ____

Var4 ____

Var5 ____

Var6 ____

Var7 ____

Var8 ____

Var9 ____

Var10 _____

'p' 'e' 'e' 'p' 'e'

'x' 'x' 'b' 'x' 'x'

's' 's' 's' 'y' 's'

'n' 'y' 'w' 'w' 'g'

't' 't' 't' 't' 'f'

'p' 'a' 'l' 'p' 'n'

'f' 'f' 'f' 'f' 'f'

'c' 'c' 'c' 'c' 'w'

'n' 'b' 'b' 'n' 'b'

'k' 'k' 'n' 'n' 'k'

Extract the first column, labeled data for edible and poisonous groups. Then delete the column. labels = data(:,1); labels = categorical(labels{:,:}); data(:,1) = [];

Store the names of predictors (features), which are described in http://archive.ics.uci.edu/ml/ machine-learning-databases/mushroom/agaricus-lepiota.names. VarNames = {'cap_shape' 'cap_surface' 'cap_color' 'bruises' 'odor' ... 'gill_attachment' 'gill_spacing' 'gill_size' 'gill_color' ... 'stalk_shape' 'stalk_root' 'stalk_surface_above_ring' ... 'stalk_surface_below_ring' 'stalk_color_above_ring' ... 'stalk_color_below_ring' 'veil_type' 'veil_color' 'ring_number' .... 'ring_type' 'spore_print_color' 'population' 'habitat'};

Set the variable names. data.Properties.VariableNames = VarNames;

There are a total of 2480 missing values denoted as '?'. sum(char(data{:,:}) == '?') ans = 2480

32-2903

32

Functions

Based on the inspection of the data set and its description, the missing values belong only to the 11th variable (stalk_root). Remove the column from the table. data(:,11) = [];

kmedoids only accepts numeric data. You need to cast the categories you have into numeric type. The distance function you will use to define the dissimilarity of the data will be based on the double representation of the categorical data. cats = categorical(data{:,:}); data = double(cats);

kmedoids can use any distance metric supported by pdist2 to cluster. For this example you will cluster the data using the Hamming distance because this is an appropriate distance metric for categorical data as illustrated below. The Hamming distance between two vectors is the percentage of the vector components that differ. For instance, consider these two vectors. v1 = [1 0 2 1]; v2 = [1 1 2 1]; They are equal in the 1st, 3rd and 4th coordinate. Since 1 of the 4 coordinates differ, the Hamming distance between these two vectors is .25. You can use the function pdist2 to measure the Hamming distance between the first and second row of data, the numerical representation of the categorical mushroom data. The value .2857 means that 6 of the 21 features of the mushroom differ. pdist2(data(1,:),data(2,:),'hamming') ans = 0.2857

In this example, you’re clustering the mushroom data into two clusters based on features to see if the clustering corresponds to edibility. The kmedoids function is guaranteed to converge to a local minima of the clustering criterion; however, this may not be a global minimum for the problem. It is a good idea to cluster the problem a few times using the 'replicates' parameter. When 'replicates' is set to a value, n, greater than 1, the k-medoids algorithm is run n times, and the best result is returned. To run kmedoids to cluster data into 2 clusters, based on the Hamming distance and to return the best result of 3 replicates, you run the following. rng('default'); % For reproducibility [IDX, C, SUMD, D, MIDX, INFO] = kmedoids(data,2,'distance','hamming','replicates',3);

Let's assume that mushrooms in the predicted group 1 are poisonous and group 2 are all edible. To determine the performance of clustering results, calculate how many mushrooms in group 1 are indeed poisonous and group 2 are edible based on the known labels. In other words, calculate the number of false positives, false negatives, as well as true positives and true negatives. Construct a confusion matrix (or matching matrix), where the diagonal elements represent the number of true positives and true negatives, respectively. The off-diagonal elements represent false negatives and false positives, respectively. For convenience, use the confusionmat function, which calculates a confusion matrix given known labels and predicted labels. Get the predicted label 32-2904

kmedoids

information from the IDX variable. IDX contains values of 1 and 2 for each data point, representing poisonous and edible groups, respectively. predLabels = labels; % Initialize a vector for predicted labels. predLabels(IDX==1) = categorical({'p'}); % Assign group 1 to be poisonous. predLabels(IDX==2) = categorical({'e'}); % Assign group 2 to be edible. confMatrix = confusionmat(labels,predLabels) confMatrix = 4176 816

32 3100

Out of 4208 edible mushrooms, 4176 were correctly predicted to be in group 2 (edible group), and 32 were incorrectly predicted to be in group 1 (poisonous group). Similarly, out of 3916 poisonous mushrooms, 3100 were correctly predicted to be in group 1 (poisonous group), and 816 were incorrectly predicted to be in group 2 (edible group). Given this confusion matrix, calculate the accuracy, which is the proportion of true results (both true positives and true negatives) against the overall data, and precision, which is the proportion of the true positives against all the positive results (true positives and false positives). accuracy = (confMatrix(1,1)+confMatrix(2,2))/(sum(sum(confMatrix))) accuracy = 0.8956 precision = confMatrix(1,1) / (confMatrix(1,1)+confMatrix(2,1)) precision = 0.8365

The results indicated that applying the k-medoids algorithm to the categorical features of mushrooms resulted in clusters that were associated with edibility.

Input Arguments X — Data numeric matrix Data, specified as a numeric matrix. The rows of X correspond to observations, and the columns correspond to variables. k — Number of medoids positive integer Number of medoids in the data, specified as a positive integer. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. 32-2905

32

Functions

Example: 'Distance','euclidean','Replicates',3,'Options',statset('UseParallel',1) specifies Euclidean distance, three replicate medoids at different starting values, and to use parallel computing. Algorithm — Algorithm to find medoids 'pam' | 'small' | 'clara' | 'large' Algorithm to find medoids, specified as the comma-separated pair consisting of 'Algorithm' and 'pam', 'small', 'clara', or 'large'. The default algorithm depends on the number of rows of X. • If the number of rows of X is less than 3000, 'pam' is the default algorithm. • If the number of rows is between 3000 and 10000, 'small' is the default algorithm. • For all other cases, 'large' is the default algorithm. You can override the default choice by explicitly stating the algorithm. This table summarizes the available algorithms. Algorithm

Description

'pam'

Partitioning Around Medoids (PAM) is the classical algorithm for solving the k-medoids problem described in [1]. After applying the initialization function to select initial medoid positions, the program performs the swap-step of the PAM algorithm, that is, it searches over all possible swaps between medoids and non-medoids to see if the sum of medoid to cluster member distances goes down. You can specify which initialization function to use via the 'Start' name-value pair argument. The algorithm proceeds as follows. 1

Build-step: Each of k clusters is associated with a potential medoid. This assignment is performed using a technique specified by the 'Start' name-value pair argument.

2

Swap-step: Within each cluster, each point is tested as a potential medoid by checking if the sum of within-cluster distances gets smaller using that point as the medoid. If so, the point is defined as a new medoid. Every point is then assigned to the cluster with the closest medoid.

The algorithm iterates the build- and swap-steps until the medoids do not change, or other termination criteria are met. The algorithm can produce better solutions than the other algorithms in some situations, but it can be prohibitively long running.

32-2906

kmedoids

Algorithm

Description

'small'

Use an algorithm similar to the k-means algorithm to find k medoids. This option employs a variant of the Lloyd’s iterations based on [2]. The algorithm proceeds as follows. 1

For each point in each cluster, calculate the sum of distances from the point to every other point in the cluster. Choose the point that minimizes the sum as the new medoid.

2

Update the cluster membership for each data point to reflect the new medoid.

The algorithm repeats these steps until no further updates occur or other termination criteria are met. The algorithm has an optional PAM-like online update phase (specified by the 'OnlinePhase' name-value pair argument) that improves cluster quality. It tends to return higher quality solutions than the clara or large algorithms, but it may not be the best choice for very large data. 'clara'

Clustering LARge Applications (CLARA) [1] repeatedly performs the PAM algorithm on random subsets of the data. It aims to overcome scaling challenges posed by the PAM algorithm through sampling. The algorithm proceeds as follows. 1

Select a subset of the data and apply the PAM algorithm to the subset.

2

Assign points of the full data set to clusters by picking the closest medoid.

The algorithm repeats these steps until the medoids do not change, or other termination criteria are met. For the best performance, it is recommended that you perform multiple replicates. By default, the program performs five replicates. Each replicate samples s rows from X (specified by 'NumSamples' name-value pair argument) to perform clustering on. By default, 40+2*k samples are selected. 'large'

This is similar to the small scale algorithm and repeatedly performs searches using a k-means like update. However, the algorithm examines only a random sample of cluster members during each iteration. The useradjustable parameter, 'PercentNeighbors', controls the number of neighbors to examine. If there is no improvement after the neighbors are examined, the algorithm terminates the local search. The algorithm performs a total of r replicates (specified by 'Replicates' name-value pair argument) and returns the best clustering result. The algorithm has an optional PAM-like online phase (specified by the 'OnlinePhase' name-value pair argument) that improves cluster quality.

Example: 'Algorithm','pam' OnlinePhase — Flag to perform PAM-like online update phase 'on' (default) | 'off'

32-2907

32

Functions

A flag to perform PAM-like online update phase, specified as a comma-separated pair consisting of 'OnlinePhase' and 'on' or 'off'. If it is on, then kmedoids performs a PAM-like update to the medoids after the Lloyd iterations in the small and large algorithms. During this online update phase, the algorithm chooses a small subset of data points in each cluster that are the furthest from and nearest to medoid. For each chosen point, it reassigns the clustering of the entire data set and check if this creates a smaller sum of distances than the best known. In other words, the swap considerations are limited to the points near the medoids and far from the medoids. The near points are considered in order to refine the clustering. The far points are considered in order to escape local minima. Turning on this feature tends to improve the quality of solutions generated by both algorithms. Total run time tends to increase as well, but the increase typically is less than one iteration of PAM. Example: OnlinePhase,'off' Distance — Distance metric 'sqeuclidean' (default) | 'euclidean' | 'seuclidean' | 'cityblock' | 'minkowski' | 'chebychev' | 'mahalanobis' | 'cosine' | 'correlation' | 'spearman' | 'hamming' | 'jaccard' | custom distance function Distance metric, in p-dimensional space, specified as the comma-separate pair consisting of 'Distance' and 'euclidean', 'seuclidean', 'cityblock', 'minkowski', 'chebychev', 'mahalanobis', 'cosine', 'correlation', 'spearman', 'hamming', 'jaccard', or a custom distance function. kmedoids minimizes the sum of medoid to cluster member distances. See pdist for the definition of each distance metric. kmedoids supports all distance metrics supported by pdist. Example: 'Distance','hamming' Options — Options to control iterative algorithm to minimize fitting criteria [] (default) | structure array returned by statset Options to control the iterative algorithm to minimize fitting criteria, specified as the commaseparated pair consisting of 'Options' and a structure array returned by statset. Supported fields of the structure array specify options for controlling the iterative algorithm. This table summarizes the supported fields.

32-2908

Field

Description

Display

Level of display output. Choices are 'off' (default) and 'iter'.

MaxIter

Maximum number of iterations allowed. The default is 100.

UseParallel

If true, compute in parallel. If Parallel Computing Toolbox is not available, then computation occurs in serial mode. The default is false, meaning serial computation.

UseSubstreams

Set to true to compute in parallel in a reproducible fashion. The default is false. To compute reproducibly, you must also set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'.

kmedoids

Field

Description

Streams

A RandStream object or cell array of such objects. For details about these options and parallel computing in Statistics and Machine Learning Toolbox, see “Speed Up Statistical Computations” or enter help parallelstats at the command line.

Example: 'Options',statset('Display','off') Replicates — Number of times to repeat clustering using new initial cluster medoid positions positive integer Number of times to repeat clustering using new initial cluster medoid positions, specified as a positive integer. The default value depends on the choice of algorithm. For pam and small, the default is 1. For clara, the default is 5. For large, the default is 3. Example: 'Replicates',4 NumSamples — Number of samples to take from data when executing clara algorithm 40+2*k (default) | positive integer Number of samples to take from the data when executing the clara algorithm, specified as a positive integer. The default number of samples is calculated as 40+2*k. Example: 'NumSamples',160 PercentNeighbors — Percent of data set to examine using large algorithm 0.001 (default) | scalar value between 0 and 1 Percent of the data set to examine using the large algorithm, specified as a positive number. The program examines percentneighbors*size(X,1) number of neighbors for the medoids. If there is no improvement in the within-cluster sum of distances, then the algorithm terminates. The value of this parameter between 0 and 1, where a value closer to 1 tends to give higher quality solutions, but the algorithm takes longer to run, and a value closer to 0 tends to give lower quality solutions, but finishes faster. Example: 'PercentNeighbors',0.01 Start — Method for choosing initial cluster medoid positions 'plus' (default) | 'sample' | 'cluster' | matrix Method for choosing initial cluster medoid positions, specified as the comma-separated pair consisting of 'Start' and 'plus', 'sample', 'cluster', or a matrix. This table summarizes the available methods. Method

Description

'plus' (default)

Select k observations from X according to the kmeans++ algorithm on page 32-2895 for cluster center initialization.

'sample'

Select k observations from X at random.

32-2909

32

Functions

Method

Description

'cluster'

Perform preliminary clustering phase on a random subsample (10%) of X. This preliminary phase is itself initialized using sample, that is, the observations are selected at random.

matrix

A custom k-by-p matrix of starting locations. In this case, you can pass in [] for the k input argument, and kmedoids infers k from the first dimension of the matrix. You can also supply a 3D array, implying a value for 'Replicates' from the array’s third dimension.

Example: 'Start','sample' Data Types: char | string | single | double

Output Arguments idx — Medoid indices numeric column vector Medoid indices, returned as a numeric column vector. idx has as many rows as X, and each row indicates the medoid assignment of the corresponding observation. C — Cluster medoid locations numeric matrix Cluster medoid locations, returned as a numeric matrix. C is a k-by-p matrix, where row j is the medoid of cluster j sumd — Within-cluster sums of point-to-medoid distances numeric column vector Within-cluster sums of point-to-medoid distances, returned as a numeric column vector. sumd is a kby1 vector, where element j is the sum of point-to-medoid distances within cluster j. D — Distances from each point to every medoid numeric matrix Distances from each point to every medoid, returned as a numeric matrix. D is an n-by-k matrix, where element (j,m) is the distance from observation j to medoid m. midx — Index to X column vector Index to X, returned as a column vector of indices. midx is a k-by-1 vector and the indices satisfy C = X(midx,:). info — Algorithm information struct Algorithm information, returned as a struct. info contains options used by the function when executed such as k-medoid clustering algorithm (algorithm), method used to choose initial cluster 32-2910

kmedoids

medoid positions (start), distance metric (distance), number of iterations taken in the best replicate (iterations) and the replicate number of the returned results (bestReplicate).

More About k-medoids Clustering k-medoids clustering is a partitioning method commonly used in domains that require robustness to outlier data, arbitrary distance metrics, or ones for which the mean or median does not have a clear definition. It is similar to k-means, and the goal of both methods is to divide a set of measurements or observations into k subsets or clusters so that the subsets minimize the sum of distances between a measurement and a center of the measurement’s cluster. In the k-means algorithm, the center of the subset is the mean of measurements in the subset, often called a centroid. In the k-medoids algorithm, the center of the subset is a member of the subset, called a medoid. The k-medoids algorithm returns medoids which are the actual data points in the data set. This allows you to use the algorithm in situations where the mean of the data does not exist within the data set. This is the main difference between k-medoids and k-means where the centroids returned by k-means may not be within the data set. Hence k-medoids is useful for clustering categorical data where a mean is impossible to define or interpret. The function kmedoids provides several iterative algorithms that minimize the sum of distances from each object to its cluster medoid, over all clusters. One of the algorithms is called partitioning around medoids (PAM) [1] which proceeds in two steps. 1

Build-step: Each of k clusters is associated with a potential medoid. This assignment is performed using a technique specified by the 'Start' name-value pair argument.

2

Swap-step: Within each cluster, each point is tested as a potential medoid by checking if the sum of within-cluster distances gets smaller using that point as the medoid. If so, the point is defined as a new medoid. Every point is then assigned to the cluster with the closest medoid.

The algorithm iterates the build- and swap-steps until the medoids do not change, or other termination criteria are met. You can control the details of the minimization using several optional input parameters to kmedoids, including ones for the initial values of the cluster medoids, and for the maximum number of iterations. By default, kmedoids uses the k-means++ algorithm on page 32-2895 for cluster medoid initialization and the squared Euclidean metric to determine distances.

References [1] Kaufman, L., and Rousseeuw, P. J. (2009). Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, New Jersey: John Wiley & Sons, Inc. [2] Park, H-S, and Jun, C-H. (2009). A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications. 36, 3336-3341. [3] Schlimmer,J.S. (1987). Concept Acquisition Through Representational Adjustment (Technical Report 87-19). Doctoral dissertation, Department of Information and Computer Science, University of California, Irvine. 32-2911

32

Functions

[4] Iba,W., Wogulis,J., and Langley,P. (1988). Trading off Simplicity and Coverage in Incremental Concept Learning. In Proceedings of the 5th International Conference on Machine Learning, 73-79. Ann Arbor, Michigan: Morgan Kaufmann. [5] Duch W, A.R., and Grabczewski, K. (1996) Extraction of logical rules from training data using backpropagation networks. Proc. of the 1st Online Workshop on Soft Computing, 19-30, pp. 25-30. [6] Duch, W., Adamczak, R., Grabczewski, K., Ishikawa, M., and Ueda, H. (1997). Extraction of crisp logical rules using constrained backpropagation networks - comparison of two new approaches. Proc. of the European Symposium on Artificial Neural Networks (ESANN'97), Bruge, Belgium 16-18. [7] Bache, K. and Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ ml]. Irvine, CA: University of California, School of Information and Computer Science.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also clusterdata | evalclusters | kmeans | linkage | linkage | pdist | silhouette Introduced in R2014b

32-2912

knnsearch

knnsearch Find k-nearest neighbors using searcher object

Syntax Idx = knnsearch(Mdl,Y) Idx = knnsearch(Mdl,Y,Name,Value) [Idx,D] = knnsearch( ___ )

Description Idx = knnsearch(Mdl,Y) searches for the nearest neighbor (i.e., the closest point, row, or observation) in Mdl.X to each point (i.e., row or observation) in the query data Y using an exhaustive search or a Kd-tree. knnsearch returns Idx, which is a column vector of the indices in Mdl.X representing the nearest neighbors. Idx = knnsearch(Mdl,Y,Name,Value) returns the indices of the closest points in Mdl.X to Y with additional options specified by one or more Name,Value pair arguments. For example, specify the number of nearest neighbors to search for, distance metric different from the one stored in Mdl.Distance. You can also specify which action to take if the closest distances are tied. [Idx,D] = knnsearch( ___ ) additionally returns the matrix D using any of the input arguments in the previous syntaxes. D contains the distances between each observation in Y that correspond to the closest observations in Mdl.X. By default, the function arranges the columns of D in ascending order by closeness, with respect to the distance metric.

Examples Search for Nearest Neighbors Using Kd-tree and Exhaustive Search knnsearch accepts ExhaustiveSearcher or KDTreeSearcher model objects to search the training data for the nearest neighbors to the query data. An ExhaustiveSearcher model invokes the exhaustive searcher algorithm, and a KDTreeSearcher model defines a Kd-tree, which knnsearch uses to search for nearest neighbors. Load Fisher's iris data set. Randomly reserve five observations from the data for query data. load fisheriris rng(1); % For reproducibility n = size(meas,1); idx = randsample(n,5); X = meas(~ismember(1:n,idx),:); % Training data Y = meas(idx,:); % Query data

The variable meas contains 4 predictors. Grow a default four-dimensional Kd-tree. MdlKDT = KDTreeSearcher(X)

32-2913

32

Functions

MdlKDT = KDTreeSearcher with properties: BucketSize: Distance: DistParameter: X:

50 'euclidean' [] [145x4 double]

MdlKDT is a KDTreeSearcher model object. You can alter its writable properties using dot notation. Prepare an exhaustive nearest neighbor searcher. MdlES = ExhaustiveSearcher(X) MdlES = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [145x4 double]

MdlKDT is an ExhaustiveSearcher model object. It contains the options, such as the distance metric, to use to find nearest neighbors. Alternatively, you can grow a Kd-tree or prepare an exhaustive nearest neighbor searcher using createns. Search the training data for the nearest neighbors indices that correspond to each query observation. Conduct both types of searches using the default settings. By default, the number of neighbors to search for per query observation is 1. IdxKDT = knnsearch(MdlKDT,Y); IdxES = knnsearch(MdlES,Y); [IdxKDT IdxES] ans = 5×2 17 6 1 89 124

17 6 1 89 124

In this case, the results of the search are the same.

Search for Nearest Neighbors of Query Data Using Minkowski Distance Grow a Kd-tree nearest neighbor searcher object by using the createns function. Pass the object and query data to the knnsearch function to find k-nearest neighbors. Load Fisher's iris data set. load fisheriris

32-2914

knnsearch

Remove five irises randomly from the predictor data to use as a query set. rng(1); n = size(meas,1); qIdx = randsample(n,5); tIdx = ~ismember(1:n,qIdx); Q = meas(qIdx,:); X = meas(tIdx,:);

% % % %

For reproducibility Sample size Indices of query data Indices of training data

Grow a four-dimensional Kd-tree using the training data. Specify the Minkowski distance for finding nearest neighbors. Mdl = createns(X,'Distance','minkowski') Mdl = KDTreeSearcher with properties: BucketSize: Distance: DistParameter: X:

50 'minkowski' 2 [145x4 double]

Because X has four columns and the distance metric is Minkowski, createns creates a KDTreeSearcher model object by default. The Minkowski distance exponent is 2 by default. Find the indices of the training data (Mdl.X) that are the two nearest neighbors of each point in the query data (Q). IdxNN = knnsearch(Mdl,Q,'K',2) IdxNN = 5×2 17 6 1 89 124

4 2 12 66 100

Each row of IdxNN corresponds to a query data observation, and the column order corresponds to the order of the nearest neighbors, with respect to ascending distance. For example, based on the Minkowski distance, the second nearest neighbor of Q(3,:) is X(12,:).

Include Ties in Nearest Neighbor Search Load Fisher's iris data set. load fisheriris

Remove five irises randomly from the predictor data to use as a query set. rng(4); n = size(meas,1); qIdx = randsample(n,5);

% For reproducibility % Sample size % Indices of query data

32-2915

32

Functions

X = meas(~ismember(1:n,qIdx),:); Y = meas(qIdx,:);

Grow a four-dimensional Kd-tree using the training data. Specify the Minkowski distance for finding nearest neighbors. Mdl = KDTreeSearcher(X);

Mdl is a KDTreeSearcher model object. By default, the distance metric for finding nearest neighbors is the Euclidean metric. Find the indices of the training data (X) that are the seven nearest neighbors of each point in the query data (Y). [Idx,D] = knnsearch(Mdl,Y,'K',7,'IncludeTies',true);

Idx and D are five-element cell arrays of vectors, with each vector having at least seven elements. Display the lengths of the vectors in Idx. cellfun('length',Idx) ans = 5×1 8 7 7 7 7

Because cell 1 contains a vector with length greater than k = 7, query observation 1 (Y(1,:)) is equally close to at least two observations in X. Display the indices of the nearest neighbors to Y(1,:) and their distances. nn5 = Idx{1} nn5 = 1×8 91

98

67

69

71

93

88

95

nn5d = D{1} nn5d = 1×8 0.1414

0.2646

0.2828

0.3000

0.3464

0.3742

0.3873

0.3873

Training observations 88 and 95 are 0.3873 cm away from query observation 1. Compare k-Nearest Neighbors Using Different Distance Metrics Train two KDTreeSearcher models using different distance metrics, and compare k-nearest neighbors of query data for the two models. Load Fisher's iris data set. Consider the petal measurements as predictors. 32-2916

knnsearch

load fisheriris X = meas(:,3:4); % Predictors Y = species; % Response

Train a KDTreeSearcher model object by using the predictors. Specify the Minkowski distance with exponent 5. KDTreeMdl = KDTreeSearcher(X,'Distance','minkowski','P',5) KDTreeMdl = KDTreeSearcher with properties: BucketSize: Distance: DistParameter: X:

50 'minkowski' 5 [150x2 double]

Find the 10 nearest neighbors from X to a query point (newpoint), first using Minkowski then Chebychev distance metrics. The query point must have the same column dimension as the data used to train the model. newpoint = [5 1.45]; [IdxMk,DMk] = knnsearch(KDTreeMdl,newpoint,'k',10); [IdxCb,DCb] = knnsearch(KDTreeMdl,newpoint,'k',10,'Distance','chebychev');

IdxMk and IdxCb are 1-by-10 matrices containing the row indices of X corresponding to the nearest neighbors to newpoint using Minkowski and Chebychev distances, respectively. Element (1,1) is the nearest, element (1,2) is the next nearest, and so on. Plot the training data, query point, and nearest neighbors.

figure; gscatter(X(:,1),X(:,2),Y); title('Fisher''s Iris Data -- Nearest Neighbors'); xlabel('Petal length (cm)'); ylabel('Petal width (cm)'); hold on plot(newpoint(1),newpoint(2),'kx','MarkerSize',10,'LineWidth',2); % Query point plot(X(IdxMk,1),X(IdxMk,2),'o','Color',[.5 .5 .5],'MarkerSize',10); % Minkowski nearest neighbors plot(X(IdxCb,1),X(IdxCb,2),'p','Color',[.5 .5 .5],'MarkerSize',10); % Chebychev nearest neighbors legend('setosa','versicolor','virginica','query point',... 'minkowski','chebychev','Location','Best');

32-2917

32

Functions

Zoom in on the points of interest. h = gca; % Get current axis handle. h.XLim = [4.5 5.5]; h.YLim = [1 2]; axis square;

32-2918

knnsearch

Several observations are equal, which is why only eight nearest neighbors are identified in the plot.

Input Arguments Mdl — Nearest neighbor searcher ExhaustiveSearcher model object | KDTreeSearcher model object Nearest neighbor searcher, specified as an ExhaustiveSearcher or KDTreeSearcher model object, respectively. If Mdl is an ExhaustiveSearcher model, then knnsearch searches for nearest neighbors using an exhaustive search. Otherwise, knnsearch uses the grown Kd-tree to search for nearest neighbors. Y — Query data numeric matrix Query data, specified as a numeric matrix. Y is an m-by-K matrix. Rows of Y correspond to observations (i.e., examples), and columns correspond to predictors (i.e., variables or features). Y must have the same number of columns as the training data stored in Mdl.X. Data Types: single | double

32-2919

32

Functions

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'K',2,'Distance','minkowski' specifies to find the two nearest neighbors of Mdl.X to each point in Y and to use the Minkowski distance metric. For Both Nearest Neighbor Searchers

Distance — Distance metric Mdl.Distance (default) | 'cityblock' | 'euclidean' | 'mahalanobis' | 'minkowski' | 'seuclidean' | function handle | ... Distance metric used to find neighbors of the training data to the query observations, specified as the comma-separated pair consisting of 'Distance' and a character vector, string scalar, or function handle. For both types of nearest neighbor searchers, knnsearch supports these distance metrics. Value

Description

'chebychev'

Chebychev distance (maximum coordinate difference).

'cityblock'

City block distance.

'euclidean'

Euclidean distance.

'minkowski'

Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'P' name-value pair argument.

If Mdl is an ExhaustiveSearcher model object, then knnsearch also supports these distance metrics.

32-2920

Value

Description

'correlation'

One minus the sample linear correlation between observations (treated as sequences of values).

'cosine'

One minus the cosine of the included angle between observations (treated as row vectors).

'hamming'

Hamming distance, which is the percentage of coordinates that differ.

'jaccard'

One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ.

'mahalanobis'

Mahalanobis distance, computed using a positive definite covariance matrix. To change the value of the covariance matrix, use the 'Cov' name-value pair argument.

knnsearch

Value

Description

'seuclidean'

Standardized Euclidean distance. Each coordinate difference between rows in Mdl.X and the query matrix is scaled by dividing by the corresponding element of the standard deviation computed from Mdl.X. To specify another scaling, use the 'Scale' name-value pair argument.

'spearman'

One minus the sample Spearman's rank correlation between observations (treated as sequences of values).

If Mdl is an ExhaustiveSearcher model object, then you can also specify a function handle for a custom distance metric by using @ (for example, @distfun). The custom distance function must: • Have the form function D2 = distfun(ZI,ZJ). • Take as arguments: • A 1-by-K vector ZI containing a single row from Mdl.X or Y, where K is the number of columns of Mdl.X. • An m-by-K matrix ZJ containing multiple rows of Mdl.X or Y, where m is a positive integer. • Return an m-by-1 vector of distances D2, where D2(j) is the distance between the observations ZI and ZJ(j,:). For more details, see “Distance Metrics” on page 18-11. Example: 'Distance','minkowski' IncludeTies — Flag to include all nearest neighbors false (0) (default) | true (1) Flag to include nearest neighbors that have the same distance from query observations, specified as the comma-separated pair consisting of 'IncludeTies' and false (0) or true (1). If IncludeTies is true, then: • knnsearch includes all nearest neighbors whose distances are equal to the kth smallest distance in the output arguments, where k is the number of searched nearest neighbors specified by the 'K' name-value pair argument. • Idx and D are m-by-1 cell arrays such that each cell contains a vector of at least k indices and distances, respectively. Each vector in D contains arranged distances in ascending order. Each row in Idx contains the indices of the nearest neighbors corresponding to the distances in D. If IncludeTies is false, then knnsearch chooses the observation with the smallest index among the observations that have the same distance from a query point. Example: 'IncludeTies',true K — Number of nearest neighbors 1 (default) | positive integer Number of nearest neighbors to search for in the training data per query observation, specified as the comma-separated pair consisting of 'K' and a positive integer. 32-2921

32

Functions

Example: 'K',2 Data Types: single | double P — Exponent for Minkowski distance metric 2 (default) | positive scalar Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar. This argument is valid only if 'Distance' is 'minkowski'. Example: 'P',3 Data Types: single | double For Kd-Tree Nearest Neighbor Searchers

SortIndices — Flag to sort returned indices according to distance true (1) (default) | false (0) Flag to sort returned indices according to distance, specified as the comma-separated pair consisting of 'SortIndices' and either true (1) or false (0). For faster performance, you can set SortIndices to false when the following are true: • Y contains many observations that have many nearest neighbors in X. • Mdl is a KDTreeSearcher model object. • IncludeTies is false. In this case, knnsearch returns the indices of the nearest neighbors in no particular order. When SortIndices is true, the function arranges the nearest-neighbor indices in ascending order by distance. SortIndices is true by default. When Mdl is an ExhaustiveSearcher model object or IncludeTies is true, the function always sorts the indices. Example: 'SortIndices',false Data Types: logical For Exhaustive Nearest Neighbor Searchers

Cov — Covariance matrix for Mahalanobis distance metric nancov(Mdl.X) (default) | positive definite matrix Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair consisting of 'Cov' and a positive definite matrix. Cov is a K-by-K matrix, where K is the number of columns of Mdl.X. If you specify Cov and do not specify 'Distance','mahalanobis', then knnsearch returns an error message. Example: 'Cov',eye(3) Data Types: single | double Scale — Scale parameter value for standardized Euclidean distance metric nanstd(Mdl.X) (default) | nonnegative numeric vector

32-2922

knnsearch

Scale parameter value for the standardized Euclidean distance metric, specified as the commaseparated pair consisting of 'Scale' and a nonnegative numeric vector. Scale has length K, where K is the number of columns of Mdl.X. The software scales each difference between the training and query data using the corresponding element of Scale. If you specify Scale and do not specify 'Distance','seuclidean', then knnsearch returns an error message. Example: 'Scale',quantile(Mdl.X,0.75) - quantile(Mdl.X,0.25) Data Types: single | double Note If you specify 'Distance', 'Cov', 'P', or 'Scale', then Mdl.Distance and Mdl.DistParameter do not change value.

Output Arguments Idx — Training data indices of nearest neighbors numeric matrix | cell array of numeric vectors Training data indices of nearest neighbors, returned as a numeric matrix or cell array of numeric vectors. • If you do not specify IncludeTies (false by default), then Idx is an m-by-k numeric matrix, where m is the number of rows in Y and k is the number of searched nearest neighbors specified by the 'K' name-value pair argument. Idx(j,i) indicates that Mdl.X(Idx(j,i),:) is one of the k closest observations in Mdl.X to the query observation Y(j,:). • If you specify 'IncludeTies',true, then Idx is an m-by-1 cell array such that cell j (Idx{j}) contains a vector of at least k indices of the closest observations in Mdl.X to the query observation Y(j,:). If SortIndices is true, then knnsearch arranges the indices in ascending order by distance. D — Distances of nearest neighbors numeric matrix | cell array of numeric vectors Distances of the nearest neighbors to the query data, returned as a numeric matrix or cell array of numeric vectors. • If you do not specify IncludeTies (false by default), then D is an m-by-k numeric matrix, where m is the number of rows in Y and k is the number of searched nearest neighbors specified by the 'K' name-value pair argument. D(j,i) is the distance between Mdl.X(Idx(j,i),:) and the query observation Y(j,:) with respect to the distance metric. • If you specify 'IncludeTies',true, then D is an m-by-1 cell array such that cell j (D{j}) contains a vector of at least k distances of the closest observations in Mdl.X to the query observation Y(j,:). If SortIndices is true, then knnsearch arranges the distances in ascending order.

32-2923

32

Functions

Tips knnsearch finds the k (positive integer) points in Mdl.X that are k-nearest for each Y point. In contrast, rangesearch finds all the points in Mdl.X that are within distance r (positive scalar) of each Y point.

Alternative Functionality • knnsearch is an object function that requires an ExhaustiveSearcher or a KDTreeSearcher model object and query data. Under equivalent conditions, the knnsearch object function returns the same results as the knnsearch function when you specify the name-value pair argument 'NSMethod','exhaustive' or 'NSMethod','kdtree', respectively. • For k-nearest neighbors classification, see fitcknn and ClassificationKNN.

References [1] Friedman, J. H., Bentely, J., and Finkel, R. A. (1977). “An Algorithm for Finding Best Matches in Logarithmic Expected Time.” ACM Transactions on Mathematical Software Vol. 3, Issue 3, Sept. 1977, pp. 209–226.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • This table contains notes about the arguments of knnsearch. Arguments not included in this table are fully supported.

32-2924

knnsearch

Argument

Notes and Limitations

Mdl

There are two ways to use Mdl in code generation. For an example, see “Code Generation for Nearest Neighbor Searcher” on page 31-13. • Use saveLearnerForCoder, loadLearnerForCoder, and codegen to generate code for the knnsearch function. Save a trained model by using saveLearnerForCoder. Define an entrypoint function that loads the saved model by using loadLearnerForCoder and calls the knnsearch function. Then use codegen to generate code for the entry-point function. • Include coder.Constant(Mdl) in the -args value of codegen. If Mdl is a KDTreeSearcher object, and the code generation build type is a MEX function, then codegen generates a MEX function using Intel Threading Building Blocks (TBB) for parallel computation. Otherwise, codegen generates code using parfor. • MEX function for the kd-tree search algorithm — codegen generates an optimized MEX function using Intel TBB for parallel computation on multicore platforms. You can use the MEX function to accelerate MATLAB algorithms. For details on Intel TBB, see https://software.intel.com/en-us/intel-tbb. If you generate the MEX function to test the generated code of the parfor version, you can disable the usage of Intel TBB. Set the ExtrinsicCalls property of the MEX configuration object to false. For details, see coder.MexCodeConfig. • MEX function for the exhaustive search algorithm and standalone C/C++ code for both algorithms — The generated code of knnsearch uses parfor to create loops that run in parallel on supported shared-memory multicore platforms in the generated code. If your compiler does not support the Open Multiprocessing (OpenMP) application interface or you disable OpenMP library, MATLAB Coder treats the parfor-loops as for-loops. To find supported compilers, see https://www.mathworks.com/ support/compilers/current_release/. To disable OpenMP library, set the EnableOpenMP property of the configuration object to false. For details, see coder.CodeConfig.

'Distance'

• Cannot be a custom distance function. • Must be a compile-time constant; its value cannot change in the generated code.

'IncludeTies'

Must be a compile-time constant; its value cannot change in the generated code.

'SortIndices'

Not supported. The output arguments are always sorted.

Name-value pair arguments

Names in name-value pair arguments must be compile-time constants. For example, to allow a user-defined exponent for the Minkowski distance in the generated code, include {coder.Constant('Distance'),coder.Constant('Minkowski' ),coder.Constant('P'),0} in the -args value of codegen.

32-2925

32

Functions

Argument

Notes and Limitations

Idx

• When you specify 'IncludeTies' as true, the sorted order of tied distances in the generated code can be different from the order in MATLAB due to numerical precision. • Starting in R2020a, knnsearch returns integer-type (int32) indices, rather than double-precision indices, in generated standalone C/C++ code. Therefore, the function allows for strict single-precision support when you use single-precision inputs. For MEX code generation, the function still returns double-precision indices to match the MATLAB behavior.

For more information, see “Introduction to Code Generation” on page 31-2 and “Code Generation for Nearest Neighbor Searcher” on page 31-13.

See Also ClassificationKNN | ExhaustiveSearcher | KDTreeSearcher | createns | fitcknn | knnsearch | rangesearch Topics “k-Nearest Neighbor Search and Radius Search” on page 18-13 “Distance Metrics” on page 18-11 Introduced in R2010a

32-2926

knnsearch

knnsearch Find k-nearest neighbors using input data

Syntax Idx = knnsearch(X,Y) Idx = knnsearch(X,Y,Name,Value) [Idx,D] = knnsearch( ___ )

Description Idx = knnsearch(X,Y) finds the nearest neighbor in X for each query point in Y and returns the indices of the nearest neighbors in Idx, a column vector. Idx has the same number of rows as Y. Idx = knnsearch(X,Y,Name,Value) returns Idx with additional options specified using one or more name-value pair arguments. For example, you can specify the number of nearest neighbors to search for and the distance metric used in the search. [Idx,D] = knnsearch( ___ ) additionally returns the matrix D, using any of the input arguments in the previous syntaxes. D contains the distances between each observation in Y and the corresponding closest observations in X.

Examples Find Nearest Neighbors Find the patients in the hospital data set that most closely resemble the patients in Y, according to age and weight. Load the hospital data set. load hospital; X = [hospital.Age hospital.Weight]; Y = [20 162; 30 169; 40 168; 50 170; 60 171];

% New patients

Perform a knnsearch between X and Y to find indices of nearest neighbors. Idx = knnsearch(X,Y);

Find the patients in X closest in age and weight to those in Y. X(Idx,:) ans = 5×2 25 25 39 49 50

171 171 164 170 172

32-2927

32

Functions

Find k-Nearest Neighbors Using Different Distance Metrics Find the 10 nearest neighbors in X to each point in Y, first using the Minkowski distance metric and then using the Chebychev distance metric. Load Fisher's iris data set. load fisheriris X = meas(:,3:4); % Measurements of original flowers Y = [5 1.45;6 2;2.75 .75]; % New flower data

Perform a knnsearch between X and the query points Y using Minkowski and Chebychev distance metrics. [mIdx,mD] = knnsearch(X,Y,'K',10,'Distance','minkowski','P',5); [cIdx,cD] = knnsearch(X,Y,'K',10,'Distance','chebychev');

Visualize the results of the two nearest neighbor searches. Plot the training data. Plot the query points with the marker X. Use circles to denote the Minkowski nearest neighbors. Use pentagrams to denote the Chebychev nearest neighbors. gscatter(X(:,1),X(:,2),species) line(Y(:,1),Y(:,2),'Marker','x','Color','k',... 'Markersize',10,'Linewidth',2,'Linestyle','none') line(X(mIdx,1),X(mIdx,2),'Color',[.5 .5 .5],'Marker','o',... 'Linestyle','none','Markersize',10) line(X(cIdx,1),X(cIdx,2),'Color',[.5 .5 .5],'Marker','p',... 'Linestyle','none','Markersize',10) legend('setosa','versicolor','virginica','query point',... 'minkowski','chebychev','Location','best')

32-2928

knnsearch

Input Arguments X — Input data numeric matrix Input data, specified as a numeric matrix. Rows of X correspond to observations, and columns correspond to variables. Data Types: single | double Y — Query points numeric matrix Query points, specified as a numeric matrix. Rows of Y correspond to observations, and columns correspond to variables. Y must have the same number of columns as X. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: knnsearch(X,Y,'K',10,'IncludeTies',true,'Distance','cityblock') searches for 10 nearest neighbors, including ties and using the city block distance. 32-2929

32

Functions

K — Number of nearest neighbors 1 (default) | positive integer Number of nearest neighbors to find in X for each point in Y, specified as the comma-separated pair consisting of 'K' and a positive integer. Example: 'K',10 Data Types: single | double IncludeTies — Flag to include all nearest neighbors false (0) (default) | true (1) Flag to include all nearest neighbors that have the same distance from query points, specified as the comma-separated pair consisting of 'IncludeTies' and false (0) or true (1). If 'IncludeTies' is false, then knnsearch chooses the observation with the smallest index among the observations that have the same distance from a query point. If 'IncludeTies' is true, then: • knnsearch includes all nearest neighbors whose distances are equal to the kth smallest distance in the output arguments. To specify k, use the 'K' name-value pair argument. • Idx and D are m-by-1 cell arrays such that each cell contains a vector of at least k indices and distances, respectively. Each vector in D contains distances arranged in ascending order. Each row in Idx contains the indices of the nearest neighbors corresponding to the distances in D. Example: 'IncludeTies',true NSMethod — Nearest neighbor search method 'kdtree' | 'exhaustive' Nearest neighbor search method, specified as the comma-separated pair consisting of 'NSMethod' and one of these values. • 'kdtree' — Creates and uses a Kd-tree to find nearest neighbors. 'kdtree' is the default value when the number of columns in X is less than or equal to 10, X is not sparse, and the distance metric is 'euclidean', 'cityblock', 'chebychev', or 'minkowski'. Otherwise, the default value is 'exhaustive'. The value 'kdtree' is valid only when the distance metric is one of the four metrics noted above. • 'exhaustive' — Uses the exhaustive search algorithm by computing the distance values from all the points in X to each point in Y. Example: 'NSMethod','exhaustive' Distance — Distance metric 'euclidean' (default) | 'seuclidean' | 'cityblock' | 'chebychev' | 'minkowski' | 'mahalanobis' | function handle | ... Distance metric knnsearch uses, specified as the comma-separated pair consisting of 'Distance' and one of the values in this table or a function handle.

32-2930

knnsearch

Value

Description

'euclidean'

Euclidean distance.

'seuclidean'

Standardized Euclidean distance. Each coordinate difference between rows and the query matrix Y is scaled by dividing by the corresponding element standard deviation computed from X. To specify another scaling, use the 'S name-value pair argument.

'cityblock'

City block distance.

'chebychev'

Chebychev distance (maximum coordinate difference).

'minkowski'

Minkowski distance. The default exponent is 2. To specify a different expon use the 'P' name-value pair argument.

'mahalanobis'

Mahalanobis distance, computed using a positive definite covariance matrix change the value of the covariance matrix, use the 'Cov' name-value pair argument.

'cosine'

One minus the cosine of the included angle between observations (treated vectors).

'correlation'

One minus the sample linear correlation between observations (treated as sequences of values).

'spearman'

One minus the sample Spearman's rank correlation between observations ( as sequences of values).

'hamming'

Hamming distance, which is the percentage of coordinates that differ.

'jaccard'

One minus the Jaccard coefficient, which is the percentage of nonzero coor that differ.

You can also specify a function handle for a custom distance metric by using @ (for example, @distfun). A custom distance function must: • Have the form function D2 = distfun(ZI,ZJ). • Take as arguments: • A 1-by-n vector ZI containing a single row from X or from the query points Y. • An m2-by-n matrix ZJ containing multiple rows of X or Y. • Return an m2-by-1 vector of distances D2, whose jth element is the distance between the observations ZI and ZJ(j,:). For more information, see “Distance Metrics” on page 18-11. Example: 'Distance','chebychev' P — Exponent for Minkowski distance metric 2 (default) | positive scalar Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar. This argument is valid only if 'Distance' is 'minkowski'. Example: 'P',3 Data Types: single | double 32-2931

32

Functions

Cov — Covariance matrix for Mahalanobis distance metric nancov(X) (default) | positive definite matrix Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair consisting of 'Cov' and a positive definite matrix. This argument is valid only if 'Distance' is 'mahalanobis'. Example: 'Cov',eye(4) Data Types: single | double Scale — Scale parameter value for standardized Euclidean distance metric nanstd(X) (default) | nonnegative numeric vector Scale parameter value for the standardized Euclidean distance metric, specified as the commaseparated pair consisting of 'Scale' and a nonnegative numeric vector. 'Scale' has length equal to the number of columns in X. When knnsearch computes the standardized Euclidean distance, each coordinate of X is scaled by the corresponding element of 'Scale', as is each query point. This argument is valid only when 'Distance' is 'seuclidean'. Example: 'Scale',quantile(X,0.75) - quantile(X,0.25) Data Types: single | double BucketSize — Maximum number of data points in leaf node of Kd-tree 50 (default) | positive integer Maximum number of data points in the leaf node of the Kd-tree, specified as the comma-separated pair consisting of 'BucketSize' and a positive integer. This argument is valid only when NSMethod is 'kdtree'. Example: 'BucketSize',20 Data Types: single | double SortIndices — Flag to sort returned indices according to distance true (1) (default) | false (0) Flag to sort returned indices according to distance, specified as the comma-separated pair consisting of 'SortIndices' and either true (1) or false (0). For faster performance, you can set SortIndices to false when the following are true: • Y contains many observations that have many nearest neighbors in X. • NSMethod is 'kdtree'. • IncludeTies is false. In this case, knnsearch returns the indices of the nearest neighbors in no particular order. When SortIndices is true, the function arranges the nearest-neighbor indices in ascending order by distance. SortIndices is true by default. When NSMethod is 'exhaustive' or IncludeTies is true, the function always sorts the indices. Example: 'SortIndices',false Data Types: logical 32-2932

knnsearch

Output Arguments Idx — Input data indices of nearest neighbors numeric matrix | cell array of numeric vectors Input data indices of the nearest neighbors, returned as a numeric matrix or cell array of numeric vectors. • If you do not specify IncludeTies (false by default), then Idx is an m-by-k numeric matrix, where m is the number of rows in Y and k is the number of searched nearest neighbors. Idx(j,i) indicates that X(Idx(j,i),:) is one of the k closest observations in X to the query point Y(j,:). • If you specify 'IncludeTies',true, then Idx is an m-by-1 cell array such that cell j (Idx{j}) contains a vector of at least k indices of the closest observations in X to the query point Y(j,:). If SortIndices is true, then knnsearch arranges the indices in ascending order by distance. D — Distances of nearest neighbors numeric matrix | cell array of numeric vectors Distances of the nearest neighbors to the query points, returned as a numeric matrix or cell array of numeric vectors. • If you do not specify IncludeTies (false by default), then D is an m-by-k numeric matrix, where m is the number of rows in Y and k is the number of searched nearest neighbors. D(j,i) is the distance between X(Idx(j,i),:) and Y(j,:) with respect to the distance metric. • If you specify 'IncludeTies',true, then D is an m-by-1 cell array such that cell j (D{j}) contains a vector of at least k distances of the closest observations in X to the query point Y(j,:). If SortIndices is true, then knnsearch arranges the distances in ascending order.

Tips • For a fixed positive integer k, knnsearch finds the k points in X that are the nearest to each point in Y. To find all points in X within a fixed distance of each point in Y, use rangesearch. • knnsearch does not save a search object. To create a search object, use createns.

Algorithms For information on a specific search algorithm, see “k-Nearest Neighbor Search and Radius Search” on page 18-13.

Alternative Functionality If you set the knnsearch function's 'NSMethod' name-value pair argument to the appropriate value ('exhaustive' for an exhaustive search algorithm or 'kdtree' for a Kd-tree algorithm), then the search results are equivalent to the results obtained by conducting a distance search using the knnsearch object function. Unlike the knnsearch function, the knnsearch object function requires an ExhaustiveSearcher or a KDTreeSearcher model object. 32-2933

32

Functions

References [1] Friedman, J. H., J. Bentely, and R. A. Finkel. “An Algorithm for Finding Best Matches in Logarithmic Expected Time.” ACM Transactions on Mathematical Software 3, no. 3 (1977): 209–226.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. Usage notes and limitations: • If X is a tall array, then Y cannot be a tall array. Similarly, if Y is a tall array, then X cannot be a tall array. For more information, see “Tall Arrays” (MATLAB). C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • For code generation, the default value of the 'NSMethod' name-value pair argument is 'exhaustive' when the number of columns in X is greater than 7. • The value of the 'Distance' name-value pair argument must be a compile-time constant and cannot be a custom distance function. • The value of the 'IncludeTies' name-value pair argument must be a compile-time constant. • The 'SortIndices' name-value pair argument is not supported. The output arguments are always sorted. • Names in name-value pair arguments must be compile-time constants. For example, to allow a user-defined exponent for the Minkowski distance in the generated code, include {coder.Constant('Distance'),coder.Constant('Minkowski'),coder.Constant('P') ,0} in the -args value of codegen. • When you specify 'IncludeTies' as true, the sorted order of tied distances in the generated code can be different from the order in MATLAB due to numerical precision. • When knnsearch uses the kd-tree search algorithm, and the code generation build type is a MEX function, codegen generates a MEX function using Intel Threading Building Blocks (TBB) for parallel computation. Otherwise, codegen generates code using parfor. • MEX function for the kd-tree search algorithm — codegen generates an optimized MEX function using Intel TBB for parallel computation on multicore platforms. You can use the MEX function to accelerate MATLAB algorithms. For details on Intel TBB, see https:// software.intel.com/en-us/intel-tbb. If you generate the MEX function to test the generated code of the parfor version, you can disable the usage of Intel TBB. Set the ExtrinsicCalls property of the MEX configuration object to false. For details, see coder.MexCodeConfig. • MEX function for the exhaustive search algorithm and standalone C/C++ code for both algorithms — The generated code of knnsearch uses parfor to create loops that run in parallel on supported shared-memory multicore platforms in the generated code. If your 32-2934

knnsearch

compiler does not support the Open Multiprocessing (OpenMP) application interface or you disable OpenMP library, MATLAB Coder treats the parfor-loops as for-loops. To find supported compilers, see https://www.mathworks.com/support/compilers/ current_release/. To disable OpenMP library, set the EnableOpenMP property of the configuration object to false. For details, see coder.CodeConfig. • Starting in R2020a, knnsearch returns integer-type (int32) indices, rather than doubleprecision indices, in generated standalone C/C++ code. Therefore, the function allows for strict single-precision support when you use single-precision inputs. For MEX code generation, the function still returns double-precision indices to match the MATLAB behavior. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The 'IncludeTies', 'NSMethod', and 'SortIndices' name-value pair arguments are not supported. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ExhaustiveSearcher | KDTreeSearcher | createns | knnsearch Topics “k-Nearest Neighbor Search and Radius Search” on page 18-13 “Distance Metrics” on page 18-11 Introduced in R2010a

32-2935

32

Functions

kruskalwallis Kruskal-Wallis test

Syntax p = kruskalwallis(x) p = kruskalwallis(x,group) p = kruskalwallis(x,group,displayopt) [p,tbl,stats] = kruskalwallis( ___ )

Description p = kruskalwallis(x) returns the p-value for the null hypothesis that the data in each column of the matrix x comes from the same distribution, using a Kruskal-Wallis test on page 32-2941. The alternative hypothesis is that not all samples come from the same distribution. kruskalwallis also returns an ANOVA table and a box plot. p = kruskalwallis(x,group) returns the p-value for a test of the null hypothesis that the data in each categorical group, as specified by the grouping variable group comes from the same distribution. The alternative hypothesis is that not all groups come from the same distribution. p = kruskalwallis(x,group,displayopt) returns the p-value of the test and lets you display or suppress the ANOVA table and box plot. [p,tbl,stats] = kruskalwallis( ___ ) also returns the ANOVA table as the cell array tbl and the structure stats containing information about the test statistics.

Examples Test Data Samples for the Same Distribution Create two different normal probability distribution objects. The first distribution has mu = 0 and sigma = 1, and the second distribution has mu = 2 and sigma = 1. pd1 = makedist('Normal'); pd2 = makedist('Normal','mu',2,'sigma',1);

Create a matrix of sample data by generating random numbers from these two distributions. rng('default'); % for reproducibility x = [random(pd1,20,2),random(pd2,20,1)];

The first two columns of x contain data generated from the first distribution, while the third column contains data generated from the second distribution. Test the null hypothesis that the sample data from each column in x comes from the same distribution. p = kruskalwallis(x)

32-2936

kruskalwallis

p = 3.6896e-06

The returned value of p indicates that kruskalwallis rejects the null hypothesis that all three data samples come from the same distribution at a 1% significance level. The ANOVA table provides additional test results, and the box plot visually presents the summary statistics for each column in x.

Conduct Followup Tests for Unequal Medians Create two different normal probability distribution objects. The first distribution has mu = 0 and sigma = 1. The second distribution has mu = 2 and sigma = 1. pd1 = makedist('Normal'); pd2 = makedist('Normal','mu',2,'sigma',1);

Create a matrix of sample data by generating random numbers from these two distributions. 32-2937

32

Functions

rng('default'); % for reproducibility x = [random(pd1,20,2),random(pd2,20,1)];

The first two columns of x contain data generated from the first distribution, while the third column contains data generated from the second distribution. Test the null hypothesis that the sample data from each column in x comes from the same distribution. Suppress the output displays, and generate the structure stats to use in further testing. [p,tbl,stats] = kruskalwallis(x,[],'off') p = 3.6896e-06 tbl=4×6 cell array Columns 1 through 5 {'Source' } {'Columns'} {'Error' } {'Total' }

{'SS' } {[7.6311e+03]} {[1.0364e+04]} {[ 17995]}

{'df'} {[ 2]} {[57]} {[59]}

{'MS' } {[3.8155e+03]} {[ 181.8228]} {0x0 double }

{'Chi-sq' } {[ 25.0200]} {0x0 double} {0x0 double}

Column 6 {'Prob>Chi-sq'} {[ 3.6896e-06]} {0x0 double } {0x0 double } stats = struct gnames: n: source: meanranks: sumt:

with fields: [3x1 char] [20 20 20] 'kruskalwallis' [26.7500 18.9500 45.8000] 0

The returned value of p indicates that the test rejects the null hypothesis at the 1% significance level. You can use the structure stats to perform additional followup testing. The cell array tbl contains the same data as the graphical ANOVA table, including column and row labels. Conduct a followup test to identify which data sample comes from a different distribution. c = multcompare(stats) Note: Intervals can be used for testing but are not simultaneous confidence intervals.

32-2938

kruskalwallis

c = 3×6 1.0000 1.0000 2.0000

2.0000 3.0000 3.0000

-5.1435 -31.9935 -39.7935

7.8000 -19.0500 -26.8500

20.7435 -6.1065 -13.9065

0.3345 0.0016 0.0000

The results indicate that there is a significant difference between groups 1 and 3, so the test rejects the null hypothesis that the data in these two groups comes from the same distribution. The same is true for groups 2 and 3. However, there is not a significant difference between groups 1 and 2, so the test does not reject the null hypothesis that these two groups come from the same distribution. Therefore, these results suggest that the data in groups 1 and 2 come from the same distribution, and the data in group 3 comes from a different distribution.

Test for the Same Distribution Across Groups Create a vector, strength, containing measurements of the strength of metal beams. Create a second vector, alloy, indicating the type of metal alloy from which the corresponding beam is made. strength = [82 86 79 83 84 85 86 87 74 82 ... 78 75 76 77 79 79 77 78 82 79]; alloy = {'st','st','st','st','st','st','st','st',... 'al1','al1','al1','al1','al1','al1',... 'al2','al2','al2','al2','al2','al2'};

32-2939

32

Functions

Test the null hypothesis that the beam strength measurements have the same distribution across all three alloys. p = kruskalwallis(strength,alloy,'off') p = 0.0018

The returned value of p indicates that the test rejects the null hypothesis at the 1% significance level.

Input Arguments x — Sample data vector | matrix Sample data for the hypothesis test, specified as a vector or an m-by-n matrix. If x is an m-by-n matrix, each of the n columns represents an independent sample containing m mutually independent observations. Data Types: single | double group — Grouping variable numeric vector | logical vector | character array | string array | cell array of character vectors Grouping variable, specified as a numeric or logical vector, a character or string array, or a cell array of character vectors. • If x is a vector, then each element in group identifies the group to which the corresponding element in x belongs, and group must be a vector of the same length as x. If a row of group contains an empty value, that row and the corresponding observation in x are disregarded. NaN values in either x or group are similarly ignored. • If x is a matrix, then each column in x represents a different group, and you can use group to specify labels for these columns. The number of elements in group and the number of columns in x must be equal. The labels contained in group also annotate the box plot. Example: {'red','blue','green','blue','red','blue','green','green','red'} Data Types: single | double | logical | char | string | cell displayopt — Display option 'on' (default) | 'off' Display option, specified as 'on' or 'off'. If displayopt is 'on', kruskalwallis displays the following figures: • An ANOVA table containing the sums of squares, degrees of freedom, and other quantities calculated based on the ranks of the data in x. • A box plot of the data in each column of the data matrix x. The box plots are based on the actual data values, rather than on the ranks. If displayopt is 'off', kruskalwallis does not display these figures. If you specify a value for displayopt, you must also specify a value for group. If you do not have a grouping variable, specify group as []. 32-2940

kruskalwallis

Example: 'off'

Output Arguments p — p-value scalar value in the range [0,1] p-value of the test, returned as a scalar value in the range [0,1]. p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of p cast doubt on the validity of the null hypothesis. tbl — ANOVA table cell array ANOVA table of test results, returned as a cell array. tbl includes the sums of squares, degrees of freedom, and other quantities calculated based on the ranks of the data in x, as well as column and row labels. stats — Test data structure Test data, returned as a structure. You can perform followup multiple comparison tests on pairs of sample medians by using multcompare, with stats as the input value.

More About Kruskal-Wallis Test The Kruskal-Wallis test is a nonparametric version of classical one-way ANOVA, and an extension of the Wilcoxon rank sum test to more than two groups. It compares the medians of the groups of data in x to determine if the samples come from the same population (or, equivalently, from different populations with the same distribution). The Kruskal-Wallis test uses ranks of the data, rather than numeric values, to compute the test statistics. It finds ranks by ordering the data from smallest to largest across all groups, and taking the numeric index of this ordering. The rank for a tied observation is equal to the average rank of all observations tied with it. The F-statistic used in classical one-way ANOVA is replaced by a chi-square statistic, and the p-value measures the significance of the chi-square statistic. The Kruskal-Wallis test assumes that all samples come from populations having the same continuous distribution, apart from possibly different locations due to group effects, and that all observations are mutually independent. By contrast, classical one-way ANOVA replaces the first assumption with the stronger assumption that the populations have normal distributions.

See Also anova1 | boxplot | friedman | multcompare | ranksum Topics “Grouping Variables” on page 2-45 Introduced before R2006a 32-2941

32

Functions

ksdensity Kernel smoothing function estimate for univariate and bivariate data

Syntax [f,xi] = ksdensity(x) [f,xi] = ksdensity(x,pts) [f,xi] = ksdensity( ___ ,Name,Value) [f,xi,bw] = ksdensity( ___ ) ksdensity( ___ ) ksdensity(ax, ___ )

Description [f,xi] = ksdensity(x) returns a probability density estimate, f, for the sample data in the vector or two-column matrix x. The estimate is based on a normal kernel function, and is evaluated at equally-spaced points, xi, that cover the range of the data in x. ksdensity estimates the density at 100 points for univariate data, or 900 points for bivariate data. ksdensity works best with continuously distributed samples. [f,xi] = ksdensity(x,pts) specifies points (pts) to evaluate f. Here, xi and pts contain identical values. [f,xi] = ksdensity( ___ ,Name,Value) uses additional options specified by one or more namevalue pair arguments in addition to any of the input arguments in the previous syntaxes. For example, you can define the function type ksdensity evaluates, such as probability density, cumulative probability, survivor function, and so on. Or you can specify the bandwidth of the smoothing window. [f,xi,bw] = ksdensity( ___ ) also returns the bandwidth of the kernel smoothing window, bw. The default bandwidth is the optimal for normal densities. ksdensity( ___ ) plots the kernel smoothing function estimate. ksdensity(ax, ___ ) plots the results using axes with the handle, ax, instead of the current axes returned by gca.

Examples Estimate Density Generate a sample data set from a mixture of two normal distributions. rng('default') % For reproducibility x = [randn(30,1); 5+randn(30,1)];

Plot the estimated density. 32-2942

ksdensity

[f,xi] = ksdensity(x); figure plot(xi,f);

The density estimate shows the bimodality of the sample. Estimate Density with Boundary Correction Generate a nonnegative sample data set from the half-normal distribution. rng('default') % For reproducibility pd = makedist('HalfNormal','mu',0,'sigma',1); x = random(pd,100,1);

Estimate pdfs with two different boundary correction methods, log transformation and reflection, by using the 'BoundaryCorrection' name-value pair argument. pts = linspace(0,5,1000); % points to evaluate the estimator [f1,xi1] = ksdensity(x,pts,'Support','positive'); [f2,xi2] = ksdensity(x,pts,'Support','positive','BoundaryCorrection','reflection');

Plot the two estimated pdfs. plot(xi1,f1,xi2,f2) lgd = legend('log','reflection'); title(lgd, 'Boundary Correction Method')

32-2943

32

Functions

xl = xlim; xlim([xl(1)-0.25 xl(2)])

ksdensity uses a boundary correction method when you specify either positive or bounded support. The default boundary correction method is log transformation. When ksdensity transforms the support back, it introduces the 1/x term in the kernel density estimator. Therefore, the estimate has a peak near x = 0. On the other hand, the reflection method does not cause undesirable peaks near the boundary.

Estimate Cumulative Distribution Function at Specified Values Load the sample data. load hospital

Compute and plot the estimated cdf evaluated at a specified set of values. pts = (min(hospital.Weight):2:max(hospital.Weight)); figure() ecdf(hospital.Weight) hold on [f,xi,bw] = ksdensity(hospital.Weight,pts,'Support','positive',... 'Function','cdf'); plot(xi,f,'-g','LineWidth',2) legend('empirical cdf','kernel-bw:default','Location','northwest')

32-2944

ksdensity

xlabel('Patient weights') ylabel('Estimated cdf')

ksdensity seems to smooth the cumulative distribution function estimate too much. An estimate with a smaller bandwidth might produce a closer estimate to the empirical cumulative distribution function. Return the bandwidth of the smoothing window. bw bw = 0.1070

Plot the cumulative distribution function estimate using a smaller bandwidth. [f,xi] = ksdensity(hospital.Weight,pts,'Support','positive',... 'Function','cdf','Bandwidth',0.05); plot(xi,f,'--r','LineWidth',2) legend('empirical cdf','kernel-bw:default','kernel-bw:0.05',... 'Location','northwest') hold off

32-2945

32

Functions

The ksdensity estimate with a smaller bandwidth matches the empirical cumulative distribution function better.

Plot Estimated Cumulative Density Function for Given Number of Points Load the sample data. load hospital

Plot the estimated cdf evaluated at 50 equally spaced points. figure() ksdensity(hospital.Weight,'Support','positive','Function','cdf',... 'NumPoints',50) xlabel('Patient weights') ylabel('Estimated cdf')

32-2946

ksdensity

Estimate Survivor and Cumulative Hazard for Censored Failure Data Generate sample data from an exponential distribution with mean 3. rng('default') % For reproducibility x = random('exp',3,100,1);

Create a logical vector that indicates censoring. Here, observations with lifetimes longer than 10 are censored. T = 10; cens = (x>T);

Compute and plot the estimated density function. figure ksdensity(x,'Support','positive','Censoring',cens);

32-2947

32

Functions

Compute and plot the survivor function. figure ksdensity(x,'Support','positive','Censoring',cens,... 'Function','survivor');

32-2948

ksdensity

Compute and plot the cumulative hazard function. figure ksdensity(x,'Support','positive','Censoring',cens,... 'Function','cumhazard');

32-2949

32

Functions

Estimate Inverse Cumulative Distribution Function for Specified Probability Values Generate a mixture of two normal distributions, and plot the estimated inverse cumulative distribution function at a specified set of probability values. rng('default') % For reproducibility x = [randn(30,1); 5+randn(30,1)]; pi = linspace(.01,.99,99); figure ksdensity(x,pi,'Function','icdf');

32-2950

ksdensity

Return Bandwidth of Smoothing Window Generate a mixture of two normal distributions. rng('default') % For reproducibility x = [randn(30,1); 5+randn(30,1)];

Return the bandwidth of the smoothing window for the probability density estimate. [f,xi,bw] = ksdensity(x); bw bw = 1.5141

The default bandwidth is optimal for normal densities. Plot the estimated density. figure plot(xi,f); xlabel('xi') ylabel('f') hold on

32-2951

32

Functions

Plot the density using an increased bandwidth value. [f,xi] = ksdensity(x,'Bandwidth',1.8); plot(xi,f,'--r','LineWidth',1.5)

32-2952

ksdensity

A higher bandwidth further smooths the density estimate, which might mask some characteristics of the distribution. Now, plot the density using a decreased bandwidth value. [f,xi] = ksdensity(x,'Bandwidth',0.8); plot(xi,f,'-.k','LineWidth',1.5) legend('bw = default','bw = 1.8','bw = 0.8') hold off

32-2953

32

Functions

A smaller bandwidth smooths the density estimate less, which exaggerates some characteristics of the sample.

Plot Kernel Density Estimate of Bivariate Data Create a two-column vector of points at which to evaluate the density. gridx1 = -0.25:.05:1.25; gridx2 = 0:.1:15; [x1,x2] = meshgrid(gridx1, gridx2); x1 = x1(:); x2 = x2(:); xi = [x1 x2];

Generate a 30-by-2 matrix containing random numbers from a mixture of bivariate normal distributions. rng('default') % For reproducibility x = [0+.5*rand(20,1) 5+2.5*rand(20,1); .75+.25*rand(10,1) 8.75+1.25*rand(10,1)];

Plot the estimated density of the sample data. figure ksdensity(x,xi);

32-2954

ksdensity

Input Arguments x — Sample data column vector | two-column matrix Sample data for which ksdensity returns f values, specified as a column vector or two-column matrix. Use a column vector for univariate data, and a two-column matrix for bivariate data. Example: [f,xi] = ksdensity(x) Data Types: single | double pts — Points at which to evaluate f vector | two-column matrix Points at which to evaluate f, specified as a vector or two-column matrix. For univariate data, pts can be a row or column vector. The length of the returned output f is equal to the number of points in pts. Example: pts = (0:1:25); ksdensity(x,pts); Data Types: single | double ax — Axes handle handle Axes handle for the figure ksdensity plots to, specified as a handle. 32-2955

32

Functions

For example, if h is a handle for a figure, then ksdensity can plot to that figure as follows. Example: ksdensity(h,x) Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Censoring',cens,'Kernel','triangle','NumPoints',20,'Function','cdf' specifies that ksdensity estimates the cdf by evaluating at 20 equally spaced points that covers the range of data, using the triangle kernel smoothing function and accounting for the censored data information in vector cens. Bandwidth — Bandwidth of the kernel smoothing window optimal value for normal densities (default) | scalar value | two-element vector The bandwidth of the kernel-smoothing window, which is a function of the number of points in x, specified as the comma-separated pair consisting of 'Bandwidth' and a scalar value. If the sample data is bivariate, Bandwidth can also be a two-element vector. The default is optimal for estimating normal densities [1], but you might want to choose a larger or smaller value to smooth more or less. If you specify 'BoundaryCorrection' as 'log'(default) and 'Support' as either 'positive' or a vector [L U], ksdensity converts bounded data to be unbounded by using log transformation. The value of 'Bandwidth' is on the scale of the transformed values. Example: 'Bandwidth',0.8 Data Types: single | double BoundaryCorrection — Boundary correction method 'log' (default) | 'reflection' Boundary correction method, specified as the comma-separated pair consisting of 'BoundaryCorrection' and 'log' or 'reflection'. Value

Description

'log'

ksdensity converts bounded data x to be unbounded by one of the following transformations. Then, it transforms back to the original bounded scale after density estimation. • For univariate data, if you specify 'Support','positive', then ksdensity applies log(x). • For univariate data, if you specify 'Support',[L U], where L and U are numeric scalars and L < U, then ksdensity applies log((x-L)/(U–x)). • For bivariate data, ksdensity transforms each column of x in the same way with the univariate data. The value of 'Bandwidth' and the bw output are on the scale of the transformed values.

'reflection'

32-2956

ksdensity augments bounded data by adding reflected data near the boundaries, then it returns estimates corresponding to the original support. For details, see “Reflection Method” on page 32-2960.

ksdensity

ksdensity applies boundary correction only when you specify 'Support' as a value other than 'unbounded'. Example: 'BoundaryCorrection','reflection' Censoring — Logical vector vector of 0s (default) | vector of 0s and 1s Logical vector indicating which entries are censored, specified as the comma-separated pair consisting of 'Censoring' and a vector of binary values. A value of 0 indicates there is no censoring, 1 indicates that observation is censored. Default is there is no censoring. This name-value pair is only valid for univariate data. Example: 'Censoring',censdata Data Types: logical Function — Function to estimate 'pdf' (default) | 'cdf' | 'icdf' | 'survivor' | 'cumhazard' Function to estimate, specified as the comma-separated pair consisting of 'Function' and one of the following. Value

Description

'pdf'

Probability density function.

'cdf'

Cumulative distribution function.

'icdf'

Inverse cumulative distribution function. ksdensity computes the estimated inverse cdf of the values in x, and evaluates it at the probability values specified in pi. This value is valid only for univariate data.

'survivor'

Survivor function.

'cumhazard'

Cumulative hazard function. This value is valid only for univariate data.

Example: 'Function','icdf' Kernel — Type of kernel smoother 'normal' (default) | 'box' | 'triangle' | 'epanechnikov' | function handle | character vector | string scalar Type of kernel smoother, specified as the comma-separated pair consisting of 'Kernel' and one of the following. • 'normal' (default) • 'box' • 'triangle' • 'epanechnikov' • A kernel function that is a custom or built-in function. Specify the function as a function handle (for example, @myfunction or @normpdf) or as a character vector or string scalar (for example, 'myfunction' or 'normpdf'). The software calls the specified function with one argument that 32-2957

32

Functions

is an array of distances between data values and locations where the density is evaluated. The function must return an array of the same size containing corresponding values of the kernel function. When 'Function' is 'pdf', the kernel function returns density values. Otherwise, it returns cumulative probability values. Specifying a custom kernel when 'Function' is 'icdf' returns an error. For bivariate data, ksdensity applies the same kernel to each dimension. Example: 'Kernel','box' NumPoints — Number of equally spaced points 100 (default) | scalar value Number of equally spaced points in xi, specified as the comma-separated pair consisting of 'NumPoints' and a scalar value. This name-value pair is only valid for univariate data. For example, for a kernel smooth estimate of a specified function at 80 equally spaced points within the range of sample data, input: Example: 'NumPoints',80 Data Types: single | double Support — Support for the density 'unbounded' (default) | 'positive' | two-element vector, [L U] | two-by-two matrix, [L1 L2; U1 U2] Support for the density, specified as the comma-separated pair consisting of 'support' and one of the following. Value

Description

'unbounded'

Default. Allow the density to extend over the whole real line.

'positive'

Restrict the density to positive values.

Two-element vector, [L U]

Give the finite lower and upper bounds for the support of the density. This option is only valid for univariate sample data.

Two-by-two matrix, [L1 L2; U1 U2]

Give the finite lower and upper bounds for the support of the density. The first row contains the lower limits and the second row contains the upper limits. This option is only valid for bivariate sample data.

For bivariate data, 'Support' can be a combination of positive, unbounded, or bounded variables specified as [0 -Inf; Inf Inf] or [0 L; Inf U]. Example: 'Support','positive' Example: 'Support',[0 10] Data Types: single | double | char | string PlotFcn — Function used to create kernel density plot 'surf' (default) | 'contour' | 'plot3' | 'surfc' Function used to create kernel density plot, specified as the comma-separated pair consisting of 'PlotFcn' and one of the following. 32-2958

ksdensity

Value

Description

'surf'

3-D shaded surface plot, created using surf

'contour'

Contour plot, created using contour

'plot3'

3-D line plot, created using plot3

'surfc'

Contour plot under a 3-D shaded surface plot, created using surfc

This name-value pair is only valid for bivariate sample data. Example: 'PlotFcn','contour' Weights — Weights for sample data vector Weights for sample data, specified as the comma-separated pair consisting of 'Weights' and a vector of length size(x,1), where x is the sample data. Example: 'Weights',xw Data Types: single | double

Output Arguments f — Estimated function values vector Estimated function values, returned as a vector whose length is equal to the number of points in xi or pts. xi — Evaluation points 100 equally spaced points | 900 equally spaced points | vector | two-column matrix Evaluation points at which ksdensity calculates f, returned as a vector or a two-column matrix. For univariate data, the default is 100 equally-spaced points that cover the range of data in x. For bivariate data, the default is 900 equally-spaced points created using meshgrid from 30 equallyspaced points in each dimension. bw — Bandwidth of smoothing window scalar value Bandwidth of smoothing window, returned as a scalar value. If you specify 'BoundaryCorrection' as 'log'(default) and 'Support' as either 'positive' or a vector [L U], ksdensity converts bounded data to be unbounded by using log transformation. The value of bw is on the scale of the transformed values.

More About Kernel Distribution A kernel distribution is a nonparametric representation of the probability density function (pdf) of a random variable. You can use a kernel distribution when a parametric distribution cannot properly describe the data, or when you want to avoid making assumptions about the distribution of the data. 32-2959

32

Functions

A kernel distribution is defined by a smoothing function and a bandwidth value, which control the smoothness of the resulting density curve. The kernel density estimator is the estimated pdf of a random variable. For any real values of x, the kernel density estimator's formula is given by f

h

x =

n x − xi 1 , K ∑ h nh i = 1

where x1, x2, …, xn are random samples from an unknown distribution, n is the sample size, K · is the kernel smoothing function, and h is the bandwidth. The kernel estimator for the cumulative distribution function (cdf), for any real values of x, is given by Fh x =



where G(x) =

x

−∞



f h(t)dt =

x

−∞

n x − xi 1 , G ∑ h ni = 1

K(t)dt.

For more details, see “Kernel Distribution” on page B-78. Reflection Method The reflection method is a boundary correction method that accurately finds kernel density estimators when a random variable has bounded support. If you specify 'BoundaryCorrection','reflection', ksdensity uses the reflection method. This method augments bounded data by adding reflected data near the boundaries, and estimates the pdf. Then, ksdensity returns the estimated pdf corresponding to the original support with proper normalization, so that the estimated pdf's integral over the original support is equal to one. If you additionally specify 'Support',[L U], then ksdensity finds the kernel estimator as follows. • If 'Function' is 'pdf', then the kernel density estimator is f h(x) =

n x − xi+ x − xi− x − xi 1 K +K +K ∑ nh i = 1 h h h

for L ≤ x ≤ U,

where xi− = 2L − xi, xi+ = 2U − xi, and xi is the ith sample data. • If 'Function' is 'cdf', then the kernel estimator for cdf is F h(x) =

n x − xi+ x − xi− x − xi 1 G + G + G ni∑ h h h =1



n L − xi+ L − xi− L − xi 1 G + G + G ni∑ h h h =1

for L ≤ x ≤ U. • To obtain a kernel estimator for an inverse cdf, a survivor function, or a cumulative hazard function (when 'Function' is 'icdf', 'survivor', or 'cumhazrd'), ksdensity uses both f h(x) and F h(x). If you additionally specify 'Support' as 'positive' or [0 inf], then ksdensity finds the kernel estimator by replacing [L U] with [0 inf] in the above equations. 32-2960

ksdensity

References [1] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press Inc., 1997. [2] Hill, P. D. “Kernel estimation of a distribution function.” Communications in Statistics - Theory and Methods. Vol 14, Issue. 3, 1985, pp. 605-620. [3] Jones, M. C. “Simple boundary correction for kernel density estimation.” Statistics and Computing. Vol. 3, Issue 3, 1993, pp. 135-146. [4] Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function supports tall arrays for out-of-memory data with some limitations. • Some options that require extra passes or sorting of the input data are not supported: • 'BoundaryCorrection' • 'Censoring' • 'Support' (support is always unbounded). • Uses standard deviation (instead of median absolute deviation) to compute the bandwidth. For more information, see “Tall Arrays for Out-of-Memory Data” (MATLAB). C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • Plotting is not supported. • Names in name-value pair arguments must be compile-time constants. • Values in the following name-value pair arguments must also be compile-time constants: 'BoundaryCorrection', 'Function', and 'Kernel'. For example, to use the 'Function','cdf' name-value pair argument in the generated code, include {coder.Constant('Function'),coder.Constant('cdf')} in the -args value of codegen. • The value of the 'Kernel' name-value pair argument cannot be a custom function handle. To specify a custom kernel function, use a character vector or string scalar. • For the value of the 'Support' name-value pair argument, the compile-time data type must match the runtime data type. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5.

See Also histogram | mvksdensity 32-2961

32

Functions

Topics “Fit Kernel Distribution Using ksdensity” on page 5-38 “Fit Distributions to Grouped Data Using ksdensity” on page 5-40 “Working with Probability Distributions” on page 5-2 “Nonparametric and Empirical Probability Distributions” on page 5-29 “Supported Distributions” on page 5-13 Introduced before R2006a

32-2962

kstest

kstest One-sample Kolmogorov-Smirnov test

Syntax h = kstest(x) h = kstest(x,Name,Value) [h,p] = kstest( ___ ) [h,p,ksstat,cv] = kstest( ___ )

Description h = kstest(x) returns a test decision for the null hypothesis that the data in vector x comes from a standard normal distribution, against the alternative that it does not come from such a distribution, using the one-sample Kolmogorov-Smirnov test on page 32-2969. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, or 0 otherwise. h = kstest(x,Name,Value) returns a test decision for the one-sample Kolmogorov-Smirnov test with additional options specified by one or more name-value pair arguments. For example, you can test for a distribution other than standard normal, change the significance level, or conduct a onesided test. [h,p] = kstest( ___ ) also returns the p-value p of the hypothesis test, using any of the input arguments from the previous syntaxes. [h,p,ksstat,cv] = kstest( ___ ) also returns the value of the test statistic ksstat and the approximate critical value cv of the test.

Examples Test for Standard Normal Distribution Perform the one-sample Kolmogorov-Smirnov test by using kstest. Confirm the test decision by visually comparing the empirical cumulative distribution function (cdf) to the standard normal cdf. Load the examgrades data set. Create a vector containing the first column of the exam grade data. load examgrades test1 = grades(:,1);

Test the null hypothesis that the data comes from a normal distribution with a mean of 75 and a standard deviation of 10. Use these parameters to center and scale each element of the data vector, because kstest tests for a standard normal distribution by default. x = (test1-75)/10; h = kstest(x) h = logical 0

32-2963

32

Functions

The returned value of h = 0 indicates that kstest fails to reject the null hypothesis at the default 5% significance level. Plot the empirical cdf and the standard normal cdf for a visual comparison. cdfplot(x) hold on x_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best')

The figure shows the similarity between the empirical cdf of the centered and scaled data vector and the cdf of the standard normal distribution.

Specify the Hypothesized Distribution Using a Two-Column Matrix Load the sample data. Create a vector containing the first column of the students’ exam grades data. load examgrades; x = grades(:,1);

Specify the hypothesized distribution as a two-column matrix. Column 1 contains the data vector x. Column 2 contains cdf values evaluated at each value in x for a hypothesized Student’s t distribution with a location parameter of 75, a scale parameter of 10, and one degree of freedom. 32-2964

kstest

test_cdf = [x,cdf('tlocationscale',x,75,10,1)];

Test if the data are from the hypothesized distribution. h = kstest(x,'CDF',test_cdf) h = logical 1

The returned value of h = 1 indicates that kstest rejects the null hypothesis at the default 5% significance level.

Specify the Hypothesized Distribution Using a Probability Distribution Object Load the sample data. Create a vector containing the first column of the students’ exam grades data. load examgrades; x = grades(:,1);

Create a probability distribution object to test if the data comes from a Student’s t distribution with a location parameter of 75, a scale parameter of 10, and one degree of freedom. test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1);

Test the null hypothesis that the data comes from the hypothesized distribution. h = kstest(x,'CDF',test_cdf) h = logical 1

The returned value of h = 1 indicates that kstest rejects the null hypothesis at the default 5% significance level.

Test the Hypothesis at Different Significance Levels Load the sample data. Create a vector containing the first column of the students’ exam grades. load examgrades; x = grades(:,1);

Create a probability distribution object to test if the data comes from a Student’s t distribution with a location parameter of 75, a scale parameter of 10, and one degree of freedom. test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1);

Test the null hypothesis that data comes from the hypothesized distribution at the 1% significance level. [h,p] = kstest(x,'CDF',test_cdf,'Alpha',0.01)

32-2965

32

Functions

h = logical 1 p = 0.0021

The returned value of h = 1 indicates that kstest rejects the null hypothesis at the 1% significance level.

Conduct a One-Sided Hypothesis Test Load the sample data. Create a vector containing the third column of the stock return data matrix. load stockreturns; x = stocks(:,3);

Test the null hypothesis that the data comes from a standard normal distribution, against the alternative hypothesis that the population cdf of the data is larger than the standard normal cdf. [h,p,k,c] = kstest(x,'Tail','larger') h = logical 1 p = 5.0854e-05 k = 0.2197 c = 0.1207

The returned value of h = 1 indicates that kstest rejects the null hypothesis in favor of the alternative hypothesis at the default 5% significance level. Plot the empirical cdf and the standard normal cdf for a visual comparison. [f,x_values] = ecdf(x); J = plot(x_values,f); hold on; K = plot(x_values,normcdf(x_values),'r--'); set(J,'LineWidth',2); set(K,'LineWidth',2); legend([J K],'Empirical CDF','Standard Normal CDF','Location','SE');

32-2966

kstest

The plot shows the difference between the empirical cdf of the data vector x and the cdf of the standard normal distribution.

Input Arguments x — Sample data vector Sample data, specified as a vector. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Tail','larger','Alpha',0.01 specifies a test using the alternative hypothesis that the cdf of the population from which the sample data is drawn is greater than the cdf of the hypothesized distribution, conducted at the 1% significance level. Alpha — Significance level 0.05 (default) | scalar value in the range (0,1) 32-2967

32

Functions

Significance level of the hypothesis test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1). Example: 'Alpha',0.01 Data Types: single | double CDF — cdf of hypothesized continuous distribution matrix | probability distribution object cdf of hypothesized continuous distribution, specified the comma-separated pair consisting of 'CDF' and either a two-column matrix or a continuous probability distribution object. When CDF is a matrix, column 1 contains a set of possible x values, and column 2 contains the corresponding hypothesized cumulative distribution function values G(x). The calculation is most efficient if CDF is specified such that column 1 contains the values in the data vector x. If there are values in x not found in column 1 of CDF, kstest approximates G(x) by interpolation. All values in x must lie in the interval between the smallest and largest values in the first column of CDF. By default, kstest tests for a standard normal distribution. The one-sample Kolmogorov-Smirnov test on page 32-2969 is only valid for continuous cumulative distribution functions, and requires CDF to be predetermined. The result is not accurate if CDF is estimated from the data. To test x against the normal, lognormal, extreme value, Weibull, or exponential distribution without specifying distribution parameters, use lillietest instead. Data Types: single | double Tail — Type of alternative hypothesis 'unequal' (default) | 'larger' | 'smaller' Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of 'Tail' and one of the following. 'unequal'

Test the alternative hypothesis that the cdf of the population from which x is drawn is not equal to the cdf of the hypothesized distribution.

'larger'

Test the alternative hypothesis that the cdf of the population from which x is drawn is greater than the cdf of the hypothesized distribution.

'smaller'

Test the alternative hypothesis that the cdf of the population from which x is drawn is less than the cdf of the hypothesized distribution.

If the values in the data vector x tend to be larger than expected from the hypothesized distribution, the empirical distribution function of x tends to be smaller, and vice versa. Example: 'Tail','larger'

Output Arguments h — Hypothesis test result 1|0 Hypothesis test result, returned as a logical value. • If h = 1, this indicates the rejection of the null hypothesis at the Alpha significance level. • If h = 0, this indicates a failure to reject the null hypothesis at the Alpha significance level. 32-2968

kstest

p — p-value scalar value in the range [0,1] p-value of the test, returned as a scalar value in the range [0,1]. p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of p cast doubt on the validity of the null hypothesis. ksstat — Test statistic nonnegative scalar value Test statistic of the hypothesis test, returned as a nonnegative scalar value. cv — Critical value nonnegative scalar value Critical value, returned as a nonnegative scalar value.

More About One-Sample Kolmogorov-Smirnov Test The one-sample Kolmogorov-Smirnov test is a nonparametric test of the null hypothesis that the population cdf of the data is equal to the hypothesized cdf. The two-sided test for “unequal” cdf functions tests the null hypothesis against the alternative that the population cdf of the data is not equal to the hypothesized cdf. The test statistic is the maximum absolute difference between the empirical cdf calculated from x and the hypothesized cdf: D* = max F x − G x x

,

where F x is the empirical cdf and G x is the cdf of the hypothesized distribution. The one-sided test for a “larger” cdf function tests the null hypothesis against the alternative that the population cdf of the data is greater than the hypothesized cdf. The test statistic is the maximum amount by which the empirical cdf calculated from x exceeds the hypothesized cdf: D* = max F x − G x . x

The one-sided test for a “smaller” cdf function tests the null hypothesis against the alternative that the population cdf of the data is less than the hypothesized cdf. The test statistic is the maximum amount by which the hypothesized cdf exceeds the empirical cdf calculated from x: D* = max G x − F x . x

kstest computes the critical value cv using an approximate formula or by interpolation in a table. The formula and table cover the range 0.01 ≤ alpha ≤ 0.2 for two-sided tests and 0.005 ≤ alpha ≤ 0.1 for one-sided tests. cv is returned as NaN if alpha is outside this range.

Algorithms kstest decides to reject the null hypothesis by comparing the p-value p with the significance level Alpha, not by comparing the test statistic ksstat with the critical value cv. Since cv is 32-2969

32

Functions

approximate, comparing ksstat with cv occasionally leads to a different conclusion than comparing p with Alpha.

References [1] Massey, F. J. “The Kolmogorov-Smirnov Test for Goodness of Fit.” Journal of the American Statistical Association. Vol. 46, No. 253, 1951, pp. 68–78. [2] Miller, L. H. “Table of Percentage Points of Kolmogorov Statistics.” Journal of the American Statistical Association. Vol. 51, No. 273, 1956, pp. 111–121. [3] Marsaglia, G., W. Tsang, and J. Wang. “Evaluating Kolmogorov’s Distribution.” Journal of Statistical Software. Vol. 8, Issue 18, 2003.

See Also adtest | kstest2 | lillietest Introduced before R2006a

32-2970

kstest2

kstest2 Two-sample Kolmogorov-Smirnov test

Syntax h = kstest2(x1,x2) h = kstest2(x1,x2,Name,Value) [h,p] = kstest2( ___ ) [h,p,ks2stat] = kstest2( ___ )

Description h = kstest2(x1,x2) returns a test decision for the null hypothesis that the data in vectors x1 and x2 are from the same continuous distribution, using the two-sample Kolmogorov-Smirnov test on page 32-2974. The alternative hypothesis is that x1 and x2 are from different continuous distributions. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, and 0 otherwise. h = kstest2(x1,x2,Name,Value) returns a test decision for a two-sample Kolmogorov-Smirnov test with additional options specified by one or more name-value pair arguments. For example, you can change the significance level or conduct a one-sided test. [h,p] = kstest2( ___ ) also returns the asymptotic p-value p, using any of the input arguments from the previous syntaxes. [h,p,ks2stat] = kstest2( ___ ) also returns the test statistic ks2stat.

Examples Test Two Samples for the Same Distribution Generate sample data from two different Weibull distributions. rng(1); % For reproducibility x1 = wblrnd(1,1,1,50); x2 = wblrnd(1.2,2,1,50);

Test the null hypothesis that data in vectors x1 and x2 comes from populations with the same distribution. h = kstest2(x1,x2) h = logical 1

The returned value of h = 1 indicates that kstest rejects the null hypothesis at the default 5% significance level.

32-2971

32

Functions

Test the Hypothesis at Different Significance Levels Generate sample data from two different Weibull distributions. rng(1); % For reproducibility x1 = wblrnd(1,1,1,50); x2 = wblrnd(1.2,2,1,50);

Test the null hypothesis that data vectors x1 and x2 are from populations with the same distribution at the 1% significance level. [h,p] = kstest2(x1,x2,'Alpha',0.01) h = logical 0 p = 0.0317

The returned value of h = 0 indicates that kstest does not reject the null hypothesis at the 1% significance level.

One-Sided Hypothesis Test Generate sample data from two different Weibull distributions. rng(1); % For reproducibility x1 = wblrnd(1,1,1,50); x2 = wblrnd(1.2,2,1,50);

Test the null hypothesis that data in vectors x1 and x2 comes from populations with the same distribution, against the alternative hypothesis that the cdf of the distribution of x1 is larger than the cdf of the distribution of x2. [h,p,k] = kstest2(x1,x2,'Tail','larger') h = logical 1 p = 0.0158 k = 0.2800

The returned value of h = 1 indicates that kstest rejects the null hypothesis, in favor of the alternative hypothesis that the cdf of the distribution of x1 is larger than the cdf of the distribution of x2, at the default 5% significance level. The returned value of k is the test statistic for the two-sample Kolmogorov-Smirnov test.

Input Arguments x1 — Sample data vector 32-2972

kstest2

Sample data from the first sample, specified as a vector. Data vectors x1 and x2 do not need to be the same size. Data Types: single | double x2 — Sample data vector Sample data from the second sample, specified as a vector. Data vectors x1 and x2 do not need to be the same size. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Tail','larger','Alpha',0.01 specifies a test using the alternative hypothesis that the empirical cdf of x1 is larger than the empirical cdf of x2, conducted at the 1% significance level. Alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level of the hypothesis test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1). Example: 'Alpha',0.01 Data Types: single | double Tail — Type of alternative hypothesis 'unequal' (default) | 'larger' | 'smaller' Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of 'Tail' and one of the following. 'unequal'

Test the alternative hypothesis that the empirical cdf of x1 is unequal to the empirical cdf of x2.

'larger'

Test the alternative hypothesis that the empirical cdf of x1 is larger than the empirical cdf of x2.

'smaller'

Test the alternative hypothesis that the empirical cdf of x1 is smaller than the empirical cdf of x2.

If the data values in x1 tend to be larger than those in x2, the empirical distribution function of x1 tends to be smaller than that of x2, and vice versa. Example: 'Tail','larger'

Output Arguments h — Hypothesis test result 1|0 Hypothesis test result, returned as a logical value. 32-2973

32

Functions

• If h = 1, this indicates the rejection of the null hypothesis at the Alpha significance level. • If h = 0, this indicates a failure to reject the null hypothesis at the Alpha significance level. p — Asymptotic p-value scalar value in the range (0,1) Asymptotic p-value of the test, returned as a scalar value in the range (0,1). p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. The asymptotic p-value becomes very accurate for large sample sizes, and is believed to be reasonably accurate for sample sizes n1 and n2, such that (n1*n2)/(n1 + n2) ≥ 4. ks2stat — Test statistic nonnegative scalar value Test statistic, returned as a nonnegative scalar value.

More About Two-Sample Kolmogorov-Smirnov Test The two-sample Kolmogorov-Smirnov test is a nonparametric hypothesis test that evaluates the difference between the cdfs of the distributions of the two sample data vectors over the range of x in each data set. The two-sided test uses the maximum absolute difference between the cdfs of the distributions of the two data vectors. The test statistic is D* = max F 1 x − F 2 x x

,

where F 1 x is the proportion of x1 values less than or equal to x and F 2 x is the proportion of x2 values less than or equal to x. The one-sided test uses the actual value of the difference between the cdfs of the distributions of the two data vectors rather than the absolute value. The test statistic is D* = max F 1 x − F 2 x . x

Algorithms In kstest2, the decision to reject the null hypothesis is based on comparing the p-value p with the significance level Alpha, not by comparing the test statistic ks2stat with a critical value.

References [1] Massey, F. J. “The Kolmogorov-Smirnov Test for Goodness of Fit.” Journal of the American Statistical Association. Vol. 46, No. 253, 1951, pp. 68–78. [2] Miller, L. H. “Table of Percentage Points of Kolmogorov Statistics.” Journal of the American Statistical Association. Vol. 51, No. 273, 1956, pp. 111–121. 32-2974

kstest2

[3] Marsaglia, G., W. Tsang, and J. Wang. “Evaluating Kolmogorov’s Distribution.” Journal of Statistical Software. Vol. 8, Issue 18, 2003.

See Also adtest | kstest | lillietest Introduced before R2006a

32-2975

32

Functions

kurtosis Kurtosis

Syntax k k k k k

= = = = =

kurtosis(X) kurtosis(X,flag) kurtosis(X,flag,'all') kurtosis(X,flag,dim) kurtosis(X,flag,vecdim)

Description k = kurtosis(X) returns the sample kurtosis of X. • If X is a vector, then kurtosis(X) returns a scalar value that is the kurtosis of the elements in X. • If X is a matrix, then kurtosis(X) returns a row vector that contains the sample kurtosis of each column in X. • If X is a multidimensional array, then kurtosis(X) operates along the first nonsingleton dimension of X. k = kurtosis(X,flag) specifies whether to correct for bias (flag is 0) or not (flag is 1, the default). When X represents a sample from a population, the kurtosis of X is biased, meaning it tends to differ from the population kurtosis by a systematic amount based on the sample size. You can set flag to 0 to correct for this systematic bias. k = kurtosis(X,flag,'all') returns the kurtosis of all elements of X. k = kurtosis(X,flag,dim) returns the kurtosis along the operating dimension dim of X. k = kurtosis(X,flag,vecdim) returns the kurtosis over the dimensions specified in the vector vecdim. For example, if X is a 2-by-3-by-4 array, then kurtosis(X,1,[1 2]) returns a 1-by-1-by-4 array. Each element of the output array is the biased kurtosis of the elements on the corresponding page of X.

Examples Find Kurtosis of Matrix Set the random seed for reproducibility of the results. rng('default')

Generate a matrix with 5 rows and 4 columns. X = randn(5,4) X = 5×4

32-2976

kurtosis

0.5377 1.8339 -2.2588 0.8622 0.3188

-1.3077 -0.4336 0.3426 3.5784 2.7694

-1.3499 3.0349 0.7254 -0.0631 0.7147

-0.2050 -0.1241 1.4897 1.4090 1.4172

Find the sample kurtosis of X. k = kurtosis(X) k = 1×4 2.7067

1.4069

2.3783

1.1759

k is a row vector containing the sample kurtosis of each column in X.

Correct for Bias in Sample Kurtosis For an input vector, correct for bias in the calculation of kurtosis by specifying the flag input argument. Set the random seed for reproducibility of the results. rng('default')

Generate a vector of length 10. x = randn(10,1) x = 10×1 0.5377 1.8339 -2.2588 0.8622 0.3188 -1.3077 -0.4336 0.3426 3.5784 2.7694

Find the biased kurtosis of x. By default, kurtosis sets the value of flag to 1 for computing the biased kurtosis. k1 = kurtosis(x) % flag is 1 by default k1 = 2.3121

Find the bias-corrected kurtosis of x by setting the value of flag to 0. k2 = kurtosis(x,0) k2 = 2.7483

32-2977

32

Functions

Find Kurtosis Along Given Dimension Find the kurtosis along different dimensions for a multidimensional array. Set the random seed for reproducibility of the results. rng('default')

Create a 4-by-3-by-2 array of random numbers. X = randn([4,3,2]) X = X(:,:,1) = 0.5377 1.8339 -2.2588 0.8622

0.3188 -1.3077 -0.4336 0.3426

3.5784 2.7694 -1.3499 3.0349

-0.1241 1.4897 1.4090 1.4172

0.6715 -1.2075 0.7172 1.6302

X(:,:,2) = 0.7254 -0.0631 0.7147 -0.2050

Find the kurtosis of X along the default dimension. k1 = kurtosis(X) k1 = k1(:,:,1) = 2.1350

1.7060

2.2789

2.3278

2.0996

k1(:,:,2) = 1.0542

By default, kurtosis operates along the first dimension of X whose size does not equal 1. In this case, this dimension is the first dimension of X. Therefore, k1 is a 1-by-3-by-2 array. Find the biased kurtosis of X along the second dimension. k2 = kurtosis(X,1,2) k2 = k2(:,:,1) = 1.5000 1.5000

32-2978

kurtosis

1.5000 1.5000 k2(:,:,2) = 1.5000 1.5000 1.5000 1.5000

k2 is a 4-by-1-by-2 array. Find the biased kurtosis of X along the third dimension. k3 = kurtosis(X,1,3) k3 = 4×3 1.0000 1.0000 1.0000 1.0000

1.0000 1.0000 1.0000 1.0000

1.0000 1.0000 1.0000 1.0000

k3 is a 4-by-3 matrix.

Find Kurtosis Along Vector of Dimensions Find the kurtosis over multiple dimensions by using the 'all' and vecdim input arguments. Set the random seed for reproducibility of the results. rng('default')

Create a 4-by-3-by-2 array of random numbers. X = randn([4 3 2]) X = X(:,:,1) = 0.5377 1.8339 -2.2588 0.8622

0.3188 -1.3077 -0.4336 0.3426

3.5784 2.7694 -1.3499 3.0349

-0.1241 1.4897 1.4090 1.4172

0.6715 -1.2075 0.7172 1.6302

X(:,:,2) = 0.7254 -0.0631 0.7147 -0.2050

32-2979

32

Functions

Find the biased kurtosis of X. kall = kurtosis(X,1,'all') kall = 2.8029

kall is the biased kurtosis of the entire input data set X. Find the biased kurtosis of each page of X by specifying the first and second dimensions. kpage = kurtosis(X,1,[1 2]) kpage = kpage(:,:,1) = 1.9345 kpage(:,:,2) = 2.5877

For example, kpage(1,1,2) is the biased kurtosis of the elements in X(:,:,2). Find the biased kurtosis of the elements in each X(i,:,:) slice by specifying the second and third dimensions. krow = kurtosis(X,1,[2 3]) krow = 4×1 3.8457 1.4306 1.7094 2.3378

For example, krow(3) is the biased kurtosis of the elements in X(3,:,:).

Input Arguments X — Input data vector | matrix | multidimensional array Input data that represents a sample from a population, specified as a vector, matrix, or multidimensional array. • If X is a vector, then kurtosis(X) returns a scalar value that is the kurtosis of the elements in X. • If X is a matrix, then kurtosis(X) returns a row vector that contains the sample kurtosis of each column in X. • If X is a multidimensional array, then kurtosis(X) operates along the first nonsingleton dimension of X. To specify the operating dimension when X is a matrix or an array, use the dim input argument. 32-2980

kurtosis

kurtosis treats NaN values in X as missing values and removes them. Data Types: single | double flag — Indicator for bias 1 (default) | 0 Indicator for the bias, specified as 0 or 1. • If flag is 1 (default), then the kurtosis of X is biased, meaning it tends to differ from the population kurtosis by a systematic amount based on the sample size. • If flag is 0, then kurtosis corrects for the systematic bias. Data Types: single | double | logical dim — Dimension positive integer Dimension along which to operate, specified as a positive integer. If you do not specify a value for dim, then the default is the first dimension of X whose size does not equal 1. Consider the kurtosis of a matrix X: • If dim is equal to 1, then kurtosis returns a row vector that contains the sample kurtosis of each column in X. • If dim is equal to 2, then kurtosis returns a column vector that contains the sample kurtosis of each row in X. If dim is greater than ndims(X) or if size(X,dim) is 1, then kurtosis returns an array of NaNs the same size as X. Data Types: single | double vecdim — Vector of dimensions positive integer vector Vector of dimensions, specified as a positive integer vector. Each element of vecdim represents a dimension of the input array X. The output k has length 1 in the specified operating dimensions. The other dimension lengths are the same for X and k. For example, if X is a 2-by-3-by-3 array, then kurtosis(X,1,[1 2]) returns a 1-by-1-by-3 array. Each element of the output is the biased kurtosis of the elements on the corresponding page of X.

Data Types: single | double 32-2981

32

Functions

Output Arguments k — Kurtosis scalar | vector | matrix | multidimensional array Kurtosis, returned as a scalar, vector, matrix, or multidimensional array.

Algorithms Kurtosis is a measure of how outlier-prone a distribution is. The kurtosis of the normal distribution on page B-119 is 3. Distributions that are more outlier-prone than the normal distribution have kurtosis greater than 3; distributions that are less outlier-prone have kurtosis less than 3. Some definitions of kurtosis subtract 3 from the computed value, so that the normal distribution has kurtosis of 0. The kurtosis function does not use this convention. The kurtosis of a distribution is defined as k=

4

E(x − μ) , σ4

where μ is the mean of x, σ is the standard deviation of x, and E(t) represents the expected value of the quantity t. The kurtosis function computes a sample version of this population value. When you set flag to 1, the kurtosis is biased, and the following equation applies:

k1 =

1 n 1 n

n



i=1 n



i=1

xi − x

4

xi − x

2

2

.

When you set flag to 0, kurtosis corrects for the systematic bias, and the following equation applies: k0 =

n−1 n−2 n−3

n + 1 k1 − 3 n − 1 + 3.

This bias-corrected equation requires that X contain at least four elements.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB). C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The 'all' and vecdim input arguments are not supported. 32-2982

kurtosis

• The dim input argument must be a compile-time constant. • If you do not specify the dim input argument, the working (or operating) dimension can be different in the generated code. As a result, run-time errors can occur. For more details, see “Automatic dimension restriction” (MATLAB Coder). For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The 'all' and vecdim input arguments are not supported. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also mean | moment | skewness | std | var Topics “Normal Distribution” on page B-119 Introduced before R2006a

32-2983

32

Functions

lasso Lasso or elastic net regularization for linear models

Syntax B = lasso(X,y) B = lasso(X,y,Name,Value) [B,FitInfo] = lasso( ___ )

Description B = lasso(X,y) returns fitted least-squares regression coefficients for linear models of the predictor data X and the response y. Each column of B corresponds to a particular regularization coefficient in Lambda. By default, lasso performs lasso regularization using a geometric sequence of Lambda values. B = lasso(X,y,Name,Value) fits regularized regressions with additional options specified by one or more name-value pair arguments. For example, 'Alpha',0.5 sets elastic net as the regularization method, with the parameter Alpha equal to 0.5. [B,FitInfo] = lasso( ___ ) also returns the structure FitInfo, which contains information about the fit of the models, using any of the input arguments in the previous syntaxes.

Examples Remove Redundant Predictors Using Lasso Regularization Construct a data set with redundant predictors and identify those predictors by using lasso. Create a matrix X of 100 five-dimensional normal variables. Create a response vector y from just two components of X, and add a small amount of noise. rng default % For reproducibility X = randn(100,5); weights = [0;2;0;-3;0]; % Only two nonzero coefficients y = X*weights + randn(100,1)*0.1; % Small added noise

Construct the default lasso fit. B = lasso(X,y);

Find the coefficient vector for the 25th Lambda value in B. B(:,25) ans = 5×1 0 1.6093 0

32-2984

lasso

-2.5865 0

lasso identifies and removes the redundant predictors.

Create Linear Model Without Intercept Term Using Lasso Regularization Create sample data X and y with X a predictor variable and Y the response variable y = 0 + 2X + ε. rng('default') % For reproducibility X = rand(100,1); y = 2*X + randn(100,1)/10;

Specify a regularization value, and find the coefficient of the regression model without an intercept term. lambda = 1e-03; B = lasso(X,y,'Lambda',lambda,'Intercept',false) Warning: When the 'Intercept' value is false, the 'Standardize' value is set to false. B = 1.9825

Plot the real values (points) against the predicted values (line). scatter(X,y) hold on x = 0:0.1:1; plot(x,x*B) hold off

32-2985

32

Functions

Remove Redundant Predictors by Using Cross-Validated Fits Construct a data set with redundant predictors and identify those predictors by using cross-validated lasso. Create a matrix X of 100 five-dimensional normal variables. Create a response vector y from two components of X, and add a small amount of noise. rng default % For reproducibility X = randn(100,5); weights = [0;2;0;-3;0]; % Only two nonzero coefficients y = X*weights + randn(100,1)*0.1; % Small added noise

Construct the lasso fit by using 10-fold cross-validation with labeled predictor variables. [B,FitInfo] = lasso(X,y,'CV',10,'PredictorNames',{'x1','x2','x3','x4','x5'});

Display the variables in the model that corresponds to the minimum cross-validated mean squared error (MSE). idxLambdaMinMSE = FitInfo.IndexMinMSE; minMSEModelPredictors = FitInfo.PredictorNames(B(:,idxLambdaMinMSE)~=0)

32-2986

lasso

minMSEModelPredictors = 1x2 cell {'x2'} {'x4'}

Display the variables in the sparsest model within one standard error of the minimum MSE. idxLambda1SE = FitInfo.Index1SE; sparseModelPredictors = FitInfo.PredictorNames(B(:,idxLambda1SE)~=0) sparseModelPredictors = 1x2 cell {'x2'} {'x4'}

In this example, lasso identifies the same predictors for the two models and removes the redundant predictors.

Lasso Plot with Cross-Validated Fits Visually examine the cross-validated error of various levels of regularization. Load the sample data. load acetylene

Create a design matrix with interactions and no constant term. X = [x1 x2 x3]; D = x2fx(X,'interaction'); D(:,1) = []; % No constant term

Construct the lasso fit using 10-fold cross-validation. Include the FitInfo output so you can plot the result. rng default % For reproducibility [B,FitInfo] = lasso(D,y,'CV',10);

Plot the cross-validated fits. lassoPlot(B,FitInfo,'PlotType','CV'); legend('show') % Show legend

32-2987

32

Functions

The green circle and dotted line locate the Lambda with minimum cross-validation error. The blue circle and dotted line locate the point with minimum cross-validation error plus one standard deviation.

Predict Values Using Elastic Net Regularization Predict students' exam scores using lasso and the elastic net method. Load the examgrades data set. load examgrades X = grades(:,1:4); y = grades(:,5);

Split the data into training and test sets. n = length(y); c = cvpartition(n,'HoldOut',0.3); idxTrain = training(c,1); idxTest = ~idxTrain; XTrain = X(idxTrain,:); yTrain = y(idxTrain); XTest = X(idxTest,:); yTest = y(idxTest);

32-2988

lasso

Find the coefficients of a regularized linear regression model using 10-fold cross-validation and the elastic net method with Alpha = 0.75. Use the largest Lambda value such that the mean squared error (MSE) is within one standard error of the minimum MSE. [B,FitInfo] = lasso(XTrain,yTrain,'Alpha',0.75,'CV',10); idxLambda1SE = FitInfo.Index1SE; coef = B(:,idxLambda1SE); coef0 = FitInfo.Intercept(idxLambda1SE);

Predict exam scores for the test data. Compare the predicted values to the actual exam grades using a reference line. yhat = XTest*coef + coef0; hold on scatter(yTest,yhat) plot(yTest,yTest) xlabel('Actual Exam Grades') ylabel('Predicted Exam Grades') hold off

Input Arguments X — Predictor data numeric matrix

32-2989

32

Functions

Predictor data, specified as a numeric matrix. Each row represents one observation, and each column represents one predictor variable. Data Types: single | double y — Response data numeric vector Response data, specified as a numeric vector. y has length n, where n is the number of rows of X. The response y(i) corresponds to the ith row of X. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: lasso(X,y,'Alpha',0.75,'CV',10) performs elastic net regularization with 10-fold cross-validation. The 'Alpha',0.75 name-value pair argument sets the parameter used in the elastic net optimization. AbsTol — Absolute error tolerance 1e–4 (default) | positive scalar Absolute error tolerance used to determine the convergence of the “ADMM Algorithm” on page 322996, specified as the comma-separated pair consisting of 'AbsTol' and a positive scalar. The algorithm converges when successive estimates of the coefficient vector differ by an amount less than AbsTol. Note This option applies only when you use lasso on tall arrays. See “Extended Capabilities” on page 32-0 for more information. Example: 'AbsTol',1e–3 Data Types: single | double Alpha — Weight of lasso versus ridge optimization 1 (default) | positive scalar Weight of lasso (L1) versus ridge (L2) optimization, specified as the comma-separated pair consisting of 'Alpha' and a positive scalar value in the interval (0,1]. The value Alpha = 1 represents lasso regression, Alpha close to 0 approaches ridge regression on page 11-106, and other values represent elastic net optimization. See “Elastic Net” on page 32-2995. Example: 'Alpha',0.5 Data Types: single | double B0 — Initial values for x-coefficients in ADMM Algorithm vector of zeros (default) | numeric vector Initial values for x-coefficients in “ADMM Algorithm” on page 32-2996, specified as the commaseparated pair consisting of 'B0' and a numeric vector. 32-2990

lasso

Note This option applies only when you use lasso on tall arrays. See “Extended Capabilities” on page 32-0 for more information. Data Types: single | double CV — Cross-validation specification for estimating mean squared error 'resubstitution' (default) | positive integer scalar | cvpartition object Cross-validation specification for estimating the mean squared error (MSE), specified as the commaseparated pair consisting of 'CV' and one of the following: • 'resubstitution' — lasso uses X and y to fit the model and to estimate the MSE without cross-validation. • Positive scalar integer K — lasso uses K-fold cross-validation. • cvpartition object cvp — lasso uses the cross-validation method expressed in cvp. You cannot use a 'leaveout' partition with lasso. Example: 'CV',3 DFmax — Maximum number of nonzero coefficients Inf (default) | positive integer scalar Maximum number of nonzero coefficients in the model, specified as the comma-separated pair consisting of 'DFmax' and a positive integer scalar. lasso returns results only for Lambda values that satisfy this criterion. Example: 'DFmax',5 Data Types: single | double Intercept — Flag for fitting the model with intercept term true (default) | false Flag for fitting the model with the intercept term, specified as the comma-separated pair consisting of 'Intercept' and either true or false. The default value is true, which indicates to include the intercept term in the model. If Intercept is false, then the returned intercept value is 0. Example: 'Intercept',false Data Types: logical Lambda — Regularization coefficients nonnegative vector Regularization coefficients, specified as the comma-separated pair consisting of 'Lambda' and a vector of nonnegative values. See “Lasso” on page 32-2995. • If you do not supply Lambda, then lasso calculates the largest value of Lambda that gives a nonnull model. In this case, LambdaRatio gives the ratio of the smallest to the largest value of the sequence, and NumLambda gives the length of the vector. • If you supply Lambda, then lasso ignores LambdaRatio and NumLambda. • If Standardize is true, then Lambda is the set of values used to fit the models with the X data standardized to have zero mean and a variance of one. 32-2991

32

Functions

The default is a geometric sequence of NumLambda values, with only the largest value able to produce B = 0. Example: 'Lambda',linspace(0,1) Data Types: single | double LambdaRatio — Ratio of smallest to largest Lambda values 1e–4 (default) | positive scalar Ratio of the smallest to the largest Lambda values when you do not supply Lambda, specified as the comma-separated pair consisting of 'LambdaRatio' and a positive scalar. If you set LambdaRatio = 0, then lasso generates a default sequence of Lambda values and replaces the smallest one with 0. Example: 'LambdaRatio',1e–2 Data Types: single | double MaxIter — Maximum number of iterations allowed positive integer scalar Maximum number of iterations allowed, specified as the comma-separated pair consisting of 'MaxIter' and a positive integer scalar. If the algorithm executes MaxIter iterations before reaching the convergence tolerance RelTol, then the function stops iterating and returns a warning message. The function can return more than one warning when NumLambda is greater than 1. Default values are 1e5 for standard data and 1e4 for tall arrays. Example: 'MaxIter',1e3 Data Types: single | double MCReps — Number of Monte Carlo repetitions for cross-validation 1 (default) | positive integer scalar Number of Monte Carlo repetitions for cross-validation, specified as the comma-separated pair consisting of 'MCReps' and a positive integer scalar. • If CV is 'resubstitution' or a cvpartition of type 'resubstitution', then MCReps must be 1. • If CV is a cvpartition of type 'holdout', then MCReps must be greater than 1. Example: 'MCReps',5 Data Types: single | double NumLambda — Number of Lambda values 100 (default) | positive integer scalar Number of Lambda values lasso uses when you do not supply Lambda, specified as the commaseparated pair consisting of 'NumLambda' and a positive integer scalar. lasso can return fewer than NumLambda fits if the residual error of the fits drops below a threshold fraction of the variance of y. Example: 'NumLambda',50 32-2992

lasso

Data Types: single | double Options — Option to cross-validate in parallel and specify random streams structure Option to cross-validate in parallel and specify the random streams, specified as the commaseparated pair consisting of 'Options' and a structure. This option requires Parallel Computing Toolbox. Create the Options structure with statset. The option fields are: • UseParallel — Set to true to compute in parallel. The default is false. • UseSubstreams — Set to true to compute in parallel in a reproducible fashion. For reproducibility, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'. The default is false. • Streams — A RandStream object or cell array consisting of one such object. If you do not specify Streams, then lasso uses the default stream. Example: 'Options',statset('UseParallel',true) Data Types: struct PredictorNames — Names of predictor variables {} (default) | string array | cell array of character vectors Names of the predictor variables, in the order in which they appear in X, specified as the commaseparated pair consisting of 'PredictorNames' and a string array or cell array of character vectors. Example: 'PredictorNames',{'x1','x2','x3','x4'} Data Types: string | cell RelTol — Convergence threshold for coordinate descent algorithm 1e–4 (default) | positive scalar Convergence threshold for the coordinate descent algorithm [3], specified as the comma-separated pair consisting of 'RelTol' and a positive scalar. The algorithm terminates when successive estimates of the coefficient vector differ in the L2 norm by a relative amount less than RelTol. Example: 'RelTol',5e–3 Data Types: single | double Rho — Augmented Lagrangian parameter positive scalar Augmented Lagrangian parameter ρ for the “ADMM Algorithm” on page 32-2996, specified as the comma-separated pair consisting of 'Rho' and a positive scalar. The default is automatic selection. Note This option applies only when you use lasso on tall arrays. See “Extended Capabilities” on page 32-0 for more information. Example: 'Rho',2 32-2993

32

Functions

Data Types: single | double Standardize — Flag for standardizing predictor data before fitting models true (default) | false Flag for standardizing the predictor data X before fitting the models, specified as the commaseparated pair consisting of 'Standardize' and either true or false. If Standardize is true, then the X data is scaled to have zero mean and a variance of one. Standardize affects whether the regularization is applied to the coefficients on the standardized scale or the original scale. The results are always presented on the original data scale. If Intercept is false, then the software sets Standardize to false, regardless of the Standardize value you specify. X and y are always centered when Intercept is true. Example: 'Standardize',false Data Types: logical U0 — Initial value of scaled dual variable vector of zeros (default) | numeric vector Initial value of the scaled dual variable u in the “ADMM Algorithm” on page 32-2996, specified as the comma-separated pair consisting of 'U0' and a numeric vector. Note This option applies only when you use lasso on tall arrays. See “Extended Capabilities” on page 32-0 for more information. Data Types: single | double Weights — Observation weights 1/n*ones(n,1) (default) | nonnegative vector Observation weights, specified as the comma-separated pair consisting of 'Weights' and a nonnegative vector. Weights has length n, where n is the number of rows of X. The lasso function scales Weights to sum to 1. Data Types: single | double

Output Arguments B — Fitted coefficients numeric matrix Fitted coefficients, returned as a numeric matrix. B is a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values. You can specify the number of Lambda values using the NumLambda name-value pair argument. The coefficient corresponding to the intercept term is a field in FitInfo. Data Types: single | double FitInfo — Fit information of models structure 32-2994

lasso

Fit information of the linear models, returned as a structure with the fields described in this table. Field in FitInfo

Description

Intercept

Intercept term β0 for each linear model, a 1-by-L vector

Lambda

Lambda parameters in ascending order, a 1-by-L vector

Alpha

Value of the Alpha parameter, a scalar

DF

Number of nonzero coefficients in B for each value of Lambda, a 1-by-L vector

MSE

Mean squared error (MSE), a 1-by-L vector

PredictorNames

Value of the PredictorNames parameter, stored as a cell array of character vectors

If you set the CV name-value pair argument to cross-validate, the FitInfo structure contains these additional fields. Field in FitInfo

Description

SE

Standard error of MSE for each Lambda, as calculated during crossvalidation, a 1-by-L vector

LambdaMinMSE

Lambda value with the minimum MSE, a scalar

Lambda1SE

Largest Lambda value such that MSE is within one standard error of the minimum MSE, a scalar

IndexMinMSE

Index of Lambda with the value LambdaMinMSE, a scalar

Index1SE

Index of Lambda with the value Lambda1SE, a scalar

More About Lasso For a given value of λ, a nonnegative parameter, lasso solves the problem min β0, β

N

1 yi − β0 − xiT β 2N i ∑ =1

2



p



j=1

βj .

• N is the number of observations. • yi is the response at observation i. • xi is data, a vector of length p at observation i. • λ is a nonnegative regularization parameter corresponding to one value of Lambda. • The parameters β0 and β are a scalar and a vector of length p, respectively. As λ increases, the number of nonzero components of β decreases. The lasso problem involves the L1 norm of β, as contrasted with the elastic net algorithm. Elastic Net For α strictly between 0 and 1, and nonnegative λ, elastic net solves the problem 32-2995

32

Functions

min β0, β

N

1 yi − β0 − xiT β 2N i ∑ =1

2

+ λPα β ,

where Pα β =

(1 − α) β 2

2 2+α

β

1

p

=



j=1

(1 − α) 2 βj + α βj . 2

Elastic net is the same as lasso when α = 1. For other values of α, the penalty term Pα(β) interpolates between the L1 norm of β and the squared L2 norm of β. As α shrinks toward 0, elastic net approaches ridge regression.

Algorithms ADMM Algorithm When operating on tall arrays, lasso uses an algorithm based on the Alternating Direction Method of Multipliers (ADMM) [5]. The notation used here is the same as in the reference paper. This method solves problems of the form Minimize l x + g z Subject to Ax + Bz = c Using this notation, the lasso regression problem is Minimize l x + g z =

1 Ax − b 2

2 2+

λ z

1

Subject to x − z = 0 1 2 Ax − b 2 is quadratic, the iterative updates performed by the 2 algorithm amount to solving a linear system of equations with a single coefficient matrix but several right-hand sides. The updates performed by the algorithm during each iteration are

Because the loss function l x =

xk + 1 = AT A + ρI

−1

AT b + ρ zk − uk

zk + 1 = Sλ/ρ xk + 1 + uk uk + 1 = uk + xk + 1 − zk + 1 A is the dataset (a tall array), x contains the coefficients, ρ is the penalty parameter (augmented Lagrangian parameter), b is the response (a tall array), and S is the soft thresholding operator. Sκ a =

a − κ, a > κ 0, a ≤ κ . a + κ, a < κ

lasso solves the linear system using Cholesky factorization because the coefficient matrix AT A + ρI is symmetric and positive definite. Because ρ does not change between iterations, the Cholesky factorization is cached between iterations. 32-2996

lasso

Even though A and b are tall arrays, they appear only in the terms AT A and AT b. The results of these two matrix multiplications are small enough to fit in memory, so they are precomputed and the iterative updates between iterations are performed entirely within memory.

References [1] Tibshirani, R. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B, Vol. 58, No. 1, 1996, pp. 267–288. [2] Zou, H., and T. Hastie. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society. Series B, Vol. 67, No. 2, 2005, pp. 301–320. [3] Friedman, J., R. Tibshirani, and T. Hastie. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software. Vol. 33, No. 1, 2010. https:// www.jstatsoft.org/v33/i01 [4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. 2nd edition. New York: Springer, 2008. [5] Boyd, S. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.” Foundations and Trends in Machine Learning. Vol. 3, No. 1, 2010, pp. 1–122.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function supports tall arrays for out-of-memory data with some limitations. • With tall arrays, lasso uses an algorithm based on ADMM (Alternating Direction Method of Multipliers). • No elastic net support. The 'Alpha' parameter is always 1. • No cross-validation ('CV' parameter) support, which includes the related parameter 'MCReps'. • The output FitInfo does not contain the additional fields 'SE', 'LambdaMinMSE', 'Lambda1SE', 'IndexMinMSE', and 'Index1SE'. • The 'Options' parameter is not supported because it does not contain options that apply to the ADMM algorithm. You can tune the ADMM algorithm using name-value pair arguments. • Supported name-value pair arguments are: • 'Lambda' • 'LambdaRatio' • 'NumLambda' • 'Standardize' • 'PredictorNames' • 'RelTol' • 'Weights' • Additional name-value pair arguments to control the ADMM algorithm are: 32-2997

32

Functions

• 'Rho' — Augmented Lagrangian parameter, ρ. The default value is automatic selection. • 'AbsTol' — Absolute tolerance used to determine convergence. The default value is 1e–4. • 'MaxIter' — Maximum number of iterations. The default value is 1e4. • 'B0' — Initial values for the coefficients x. The default value is a vector of zeros. • 'U0' — Initial values of the scaled dual variable u. The default value is a vector of zeros. For more information, see “Tall Arrays” (MATLAB). Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also fitlm | fitrlinear | lassoPlot | lassoglm | ridge Topics “Lasso Regularization” on page 11-117 “Lasso and Elastic Net with Cross Validation” on page 11-120 “Wide Data via Lasso and Parallel Computing” on page 11-112 “Lasso and Elastic Net” on page 11-109 “Introduction to Feature Selection” on page 15-49 Introduced in R2011b

32-2998

lassoglm

lassoglm Lasso or elastic net regularization for generalized linear models

Syntax B = lassoglm(X,y) B = lassoglm(X,y,distr) B = lassoglm(X,y,distr,Name,Value) [B,FitInfo] = lassoglm( ___ )

Description B = lassoglm(X,y) returns penalized, maximum-likelihood fitted coefficients for generalized linear models of the predictor data X and the response y, where the values in y are assumed to have a normal probability distribution. Each column of B corresponds to a particular regularization coefficient in Lambda. By default, lassoglm performs lasso regularization using a geometric sequence of Lambda values. B = lassoglm(X,y,distr) performs lasso regularization to fit the models using the probability distribution distr for y. B = lassoglm(X,y,distr,Name,Value) fits regularized generalized linear regressions with additional options specified by one or more name-value pair arguments. For example, 'Alpha',0.5 sets elastic net as the regularization method, with the parameter Alpha equal to 0.5. [B,FitInfo] = lassoglm( ___ ) also returns the structure FitInfo, which contains information about the fit of the models, using any of the input arguments in the previous syntaxes.

Examples Remove Redundant Predictors Using Lasso Regularization Construct a data set with redundant predictors and identify those predictors by using lassoglm. Create a random matrix X with 100 observations and 10 predictors. Create the normally distributed response y using only four of the predictors and a small amount of noise. rng default X = randn(100,10); weights = [0.6;0.5;0.7;0.4]; y = X(:,[2 4 5 7])*weights + randn(100,1)*0.1; % Small added noise

Perform lasso regularization. B = lassoglm(X,y);

Find the coefficient vector for the 75th Lambda value in B. B(:,75)

32-2999

32

Functions

ans = 10×1 0 0.5431 0 0.3944 0.6173 0 0.3473 0 0 0

lassoglm identifies and removes the redundant predictors.

Cross-Validated Lasso Regularization of Generalized Linear Model Construct data from a Poisson model, and identify the important predictors by using lassoglm. Create data with 20 predictors. Create a Poisson response variable using only three of the predictors plus a constant. rng default % For reproducibility X = randn(100,20); weights = [.4;.2;.3]; mu = exp(X(:,[5 10 15])*weights + 1); y = poissrnd(mu);

Construct a cross-validated lasso regularization of a Poisson regression model of the data. [B,FitInfo] = lassoglm(X,y,'poisson','CV',10);

Examine the cross-validation plot to see the effect of the Lambda regularization parameter. lassoPlot(B,FitInfo,'plottype','CV'); legend('show') % Show legend

32-3000

lassoglm

The green circle and dotted line locate the Lambda with minimum cross-validation error. The blue circle and dotted line locate the point with minimum cross-validation error plus one standard deviation. Find the nonzero model coefficients corresponding to the two identified points. idxLambdaMinDeviance = FitInfo.IndexMinDeviance; mincoefs = find(B(:,idxLambdaMinDeviance)) mincoefs = 7×1 3 5 6 10 11 15 16 idxLambda1SE = FitInfo.Index1SE; min1coefs = find(B(:,idxLambda1SE)) min1coefs = 3×1 5 10

32-3001

32

Functions

15

The coefficients from the minimum-plus-one standard error point are exactly those coefficients used to create the data.

Predict Values Using Lasso Regularization Predict whether students got a B or above on their last exam by using lassoglm. Load the examgrades data set. Convert the last exam grades to a logical vector, where 1 represents a grade of 80 or above and 0 represents a grade below 80. load examgrades X = grades(:,1:4); y = grades(:,5); yBinom = (y>=80);

Partition the data into training and test sets. rng default % Set the seed for reproducibility c = cvpartition(yBinom,'HoldOut',0.3); idxTrain = training(c,1); idxTest = ~idxTrain; XTrain = X(idxTrain,:); yTrain = yBinom(idxTrain); XTest = X(idxTest,:); yTest = yBinom(idxTest);

Perform lasso regularization for generalized linear model regression with 3-fold cross-validation on the training data. Assume the values in y are binomially distributed. Choose model coefficients corresponding to the Lambda with minimum expected deviance. [B,FitInfo] = lassoglm(XTrain,yTrain,'binomial','CV',3); idxLambdaMinDeviance = FitInfo.IndexMinDeviance; B0 = FitInfo.Intercept(idxLambdaMinDeviance); coef = [B0; B(:,idxLambdaMinDeviance)] coef = 5×1 -21.1911 0.0235 0.0670 0.0693 0.0949

Predict exam grades for the test data using the model coefficients found in the previous step. Specify the link function for a binomial response using 'logit'. Convert the prediction values to a logical vector. yhat = glmval(coef,XTest,'logit'); yhatBinom = (yhat>=0.5);

Determine the accuracy of the predictions using a confusion matrix. 32-3002

lassoglm

c = confusionchart(yTest,yhatBinom);

The function correctly predicts 31 exam grades. However, the function incorrectly predicts that 1 student receives a B or above and 4 students receive a grade below a B.

Input Arguments X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row represents one observation, and each column represents one predictor variable. Data Types: single | double y — Response data numeric vector | logical vector | categorical array | numeric matrix Response data, specified as a numeric vector, logical vector, categorical array, or two-column numeric matrix. • When distr is not 'binomial', y is a numeric vector or categorical array of length n, where n is the number of rows in X. The response y(i) corresponds to row i in X. • When distr is 'binomial', y is one of the following: 32-3003

32

Functions

• Numeric vector of length n, where each entry represents success (1) or failure (0) • Logical vector of length n, where each entry represents success or failure • Categorical array of length n, where each entry represents success or failure • Two-column numeric matrix, where the first column contains the number of successes for each observation and the second column contains the total number of trials Data Types: single | double | logical | categorical distr — Distribution of response data 'normal' | 'binomial' | 'poisson' | 'gamma' | 'inverse gaussian' Distribution of response data, specified as one of the following: • 'normal' • 'binomial' • 'poisson' • 'gamma' • 'inverse gaussian' lassoglm uses the default link function on page 32-3009 corresponding to distr. Specify another link function using the Link name-value pair argument. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: lassoglm(X,y,'poisson','Alpha',0.5) performs elastic net regularization assuming that the response values are Poisson distributed. The 'Alpha',0.5 name-value pair argument sets the parameter used in the elastic net optimization. Alpha — Weight of lasso versus ridge optimization 1 (default) | positive scalar Weight of lasso (L1) versus ridge (L2) optimization, specified as the comma-separated pair consisting of 'Alpha' and a positive scalar value in the interval (0,1]. The value Alpha = 1 represents lasso regression, Alpha close to 0 approaches ridge regression, and other values represent elastic net optimization. See “Elastic Net” on page 32-3010. Example: 'Alpha',0.75 Data Types: single | double CV — Cross-validation specification for estimating deviance 'resubstitution' (default) | positive integer scalar | cvpartition object Cross-validation specification for estimating the deviance, specified as the comma-separated pair consisting of 'CV' and one of the following: • 'resubstitution' — lassoglm uses X and y to fit the model and to estimate the deviance without cross-validation. • Positive scalar integer K — lassoglm uses K-fold cross-validation. 32-3004

lassoglm

• cvpartition object cvp — lassoglm uses the cross-validation method expressed in cvp. You cannot use a 'leaveout' partition with lassoglm. Example: 'CV',10 DFmax — Maximum number of nonzero coefficients Inf (default) | positive integer scalar Maximum number of nonzero coefficients in the model, specified as the comma-separated pair consisting of 'DFmax' and a positive integer scalar. lassoglm returns results only for Lambda values that satisfy this criterion. Example: 'DFmax',25 Data Types: single | double Lambda — Regularization coefficients nonnegative vector Regularization coefficients, specified as the comma-separated pair consisting of 'Lambda' and a vector of nonnegative values. See “Lasso” on page 32-3010. • If you do not supply Lambda, then lassoglm estimates the largest value of Lambda that gives a nonnull model. In this case, LambdaRatio gives the ratio of the smallest to the largest value of the sequence, and NumLambda gives the length of the vector. • If you supply Lambda, then lassoglm ignores LambdaRatio and NumLambda. • If Standardize is true, then Lambda is the set of values used to fit the models with the X data standardized to have zero mean and a variance of one. The default is a geometric sequence of NumLambda values, with only the largest value able to produce B = 0. Data Types: single | double LambdaRatio — Ratio of smallest to largest Lambda values 1e–4 (default) | positive scalar Ratio of the smallest to the largest Lambda values when you do not supply Lambda, specified as the comma-separated pair consisting of 'LambdaRatio' and a positive scalar. If you set LambdaRatio = 0, then lassoglm generates a default sequence of Lambda values and replaces the smallest one with 0. Example: 'LambdaRatio',1e–2 Data Types: single | double Link — Mapping between mean of response and linear predictor 'comploglog' | 'identity' | 'log' | 'logit' | 'loglog' | ... Mapping between the mean µ of the response and the linear predictor Xb, specified as the commaseparated pair consisting of 'Link' and one of the values in this table. Value

Description

'comploglog'

log(–log((1 – µ))) = Xb

32-3005

32

Functions

Value

Description

'identity', default for the distribution 'normal'

µ = Xb

'log', default for the distribution 'poisson'

log(µ) = Xb

'logit', default for the distribution 'binomial'

log(µ/(1 – µ)) = Xb

'loglog'

log(–log(µ)) = Xb

'probit'

Φ–1(µ) = Xb, where Φ is the normal (Gaussian) cumulative distribution function

'reciprocal', default for the distribution 'gamma'

µ–1 = Xb

p (a number), default for the distribution 'inverse gaussian' (with p = –2)

µp = Xb

A cell array of the form {FL FD FI}, User-specified link function (see “Custom Link Function” containing three function handles created on page 12-12) using @, which define the link (FL), the derivative of the link (FD), and the inverse link (FI). Or, a structure of function handles with the field Link containing FL, the field Derivative containing FD, and the field Inverse containing FI. Example: 'Link','probit' Data Types: char | string | single | double | cell MaxIter — Maximum number of iterations allowed 1e4 (default) | positive integer scalar Maximum number of iterations allowed, specified as the comma-separated pair consisting of 'MaxIter' and a positive integer scalar. If the algorithm executes MaxIter iterations before reaching the convergence tolerance RelTol, then the function stops iterating and returns a warning message. The function can return more than one warning when NumLambda is greater than 1. Example: 'MaxIter',1e3 Data Types: single | double MCReps — Number of Monte Carlo repetitions for cross-validation 1 (default) | positive integer scalar Number of Monte Carlo repetitions for cross-validation, specified as the comma-separated pair consisting of 'MCReps' and a positive integer scalar. • If CV is 'resubstitution' or a cvpartition of type 'resubstitution', then MCReps must be 1. • If CV is a cvpartition of type 'holdout', then MCReps must be greater than 1. 32-3006

lassoglm

Example: 'MCReps',2 Data Types: single | double NumLambda — Number of Lambda values 100 (default) | positive integer scalar Number of Lambda values lassoglm uses when you do not supply Lambda, specified as the commaseparated pair consisting of 'NumLambda' and a positive integer scalar. lassoglm can return fewer than NumLambda fits if the deviance of the fits drops below a threshold fraction of the null deviance (deviance of the fit without any predictors X). Example: 'NumLambda',150 Data Types: single | double Offset — Additional predictor variable numeric vector Additional predictor variable, specified as the comma-separated pair consisting of 'Offset' and a numeric vector with the same number of rows as X. The lassoglm function keeps the coefficient value of Offset fixed at 1.0. Data Types: single | double Options — Option to cross-validate in parallel and specify random streams structure Option to cross-validate in parallel and specify the random streams, specified as the commaseparated pair consisting of 'Options' and a structure. This option requires Parallel Computing Toolbox. Create the Options structure with statset. The option fields are: • UseParallel — Set to true to compute in parallel. The default is false. • UseSubstreams — Set to true to compute in parallel in a reproducible fashion. For reproducibility, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'. The default is false. • Streams — A RandStream object or cell array consisting of one such object. If you do not specify Streams, then lassoglm uses the default stream. Example: 'Options',statset('UseParallel',true) Data Types: struct PredictorNames — Names of predictor variables {} (default) | string array | cell array of character vectors Names of the predictor variables, in the order in which they appear in X, specified as the commaseparated pair consisting of 'PredictorNames' and a string array or cell array of character vectors. Example: 'PredictorNames',{'Height','Weight','Age'} Data Types: string | cell RelTol — Convergence threshold for coordinate descent algorithm 1e–4 (default) | positive scalar 32-3007

32

Functions

Convergence threshold for the coordinate descent algorithm [3], specified as the comma-separated pair consisting of 'RelTol' and a positive scalar. The algorithm terminates when successive estimates of the coefficient vector differ in the L2 norm by a relative amount less than RelTol. Example: 'RelTol',2e–3 Data Types: single | double Standardize — Flag for standardizing predictor data before fitting models true (default) | false Flag for standardizing the predictor data X before fitting the models, specified as the commaseparated pair consisting of 'Standardize' and either true or false. If Standardize is true, then the X data is scaled to have zero mean and a variance of one. Standardize affects whether the regularization is applied to the coefficients on the standardized scale or the original scale. The results are always presented on the original data scale. Example: 'Standardize',false Data Types: logical Weights — Observation weights 1/n*ones(n,1) (default) | nonnegative vector Observation weights, specified as the comma-separated pair consisting of 'Weights' and a nonnegative vector. Weights has length n, where n is the number of rows of X. At least two values must be positive. Data Types: single | double

Output Arguments B — Fitted coefficients numeric matrix Fitted coefficients, returned as a numeric matrix. B is a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values. You can specify the number of Lambda values using the NumLambda name-value pair argument. The coefficient corresponding to the intercept term is a field in FitInfo. Data Types: single | double FitInfo — Fit information of models structure Fit information of the generalized linear models, returned as a structure with the fields described in this table.

32-3008

Field in FitInfo

Description

Intercept

Intercept term β0 for each linear model, a 1-by-L vector

Lambda

Lambda parameters in ascending order, a 1-by-L vector

Alpha

Value of the Alpha parameter, a scalar

lassoglm

Field in FitInfo

Description

DF

Number of nonzero coefficients in B for each value of Lambda, a 1by-L vector

Deviance

Deviance of the fitted model for each value of Lambda, a 1-by-L vector If the model is cross-validated, then the values for Deviance represent the estimated expected deviance of the model applied to new data, as calculated by cross-validation. Otherwise, Deviance is the deviance of the fitted model applied to the data used to perform the fit.

PredictorNames

Value of the PredictorNames parameter, stored as a cell array of character vectors

If you set the CV name-value pair argument to cross-validate, the FitInfo structure contains these additional fields. Field in FitInfo

Description

SE

Standard error of Deviance for each Lambda, as calculated during cross-validation, a 1-by-L vector

LambdaMinDeviance

Lambda value with minimum expected deviance, as calculated by cross-validation, a scalar

Lambda1SE

Largest Lambda value such that Deviance is within one standard error of the minimum, a scalar

IndexMinDeviance

Index of Lambda with the value LambdaMinDeviance, a scalar

Index1SE

Index of Lambda with the value Lambda1SE, a scalar

More About Link Function A link function f(μ) maps a distribution with mean μ to a linear model with data X and coefficient vector b using the formula f(μ)

=

Xb.

You can find the formulas for the link functions in the Link name-value pair argument description. This table lists the link functions that are typically used for each distribution. Distribution Family

Default Link Function

'normal'

'identity'

'binomial'

'logit'

'poisson'

'log'

'gamma'

'reciprocal'

'inverse gaussian'

–2

Other Typical Link Functions 'comploglog', 'loglog', 'probit'

32-3009

32

Functions

Lasso For a nonnegative value of λ, lassoglm solves the problem min β0, β

p

1 Deviance β0, β + λ ∑ β j . N j=1

• The function Deviance in this equation is the deviance of the model fit to the responses using the intercept β0 and the predictor coefficients β. The formula for Deviance depends on the distr parameter you supply to lassoglm. Minimizing the λ-penalized deviance is equivalent to maximizing the λ-penalized loglikelihood. • N is the number of observations. • λ is a nonnegative regularization parameter corresponding to one value of Lambda. • The parameters β0 and β are a scalar and a vector of length p, respectively. As λ increases, the number of nonzero components of β decreases. The lasso problem involves the L1 norm of β, as contrasted with the elastic net algorithm. Elastic Net For α strictly between 0 and 1, and nonnegative λ, elastic net solves the problem min β0, β

1 Deviance β0, β + λPα β , N

where Pα β =

(1 − α) β 2

2 2+α

β

1

=

p



j=1

(1 − α) 2 βj + α βj . 2

Elastic net is the same as lasso when α = 1. For other values of α, the penalty term Pα(β) interpolates between the L1 norm of β and the squared L2 norm of β. As α shrinks toward 0, elastic net approaches ridge regression.

References [1] Tibshirani, R. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B, Vol. 58, No. 1, 1996, pp. 267–288. [2] Zou, H., and T. Hastie. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society. Series B, Vol. 67, No. 2, 2005, pp. 301–320. [3] Friedman, J., R. Tibshirani, and T. Hastie. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software. Vol. 33, No. 1, 2010. https:// www.jstatsoft.org/v33/i01 [4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. 2nd edition. New York: Springer, 2008. [5] Dobson, A. J. An Introduction to Generalized Linear Models. 2nd edition. New York: Chapman & Hall/CRC Press, 2002. 32-3010

lassoglm

[6] McCullagh, P., and J. A. Nelder. Generalized Linear Models. 2nd edition. New York: Chapman & Hall/CRC Press, 1989. [7] Collett, D. Modelling Binary Data. 2nd edition. New York: Chapman & Hall/CRC Press, 2003.

Extended Capabilities Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also fitglm | glmval | lasso | lassoPlot | ridge Topics “Lasso Regularization of Generalized Linear Models” on page 12-32 “Regularize Poisson Regression” on page 12-34 “Regularize Logistic Regression” on page 12-36 “Regularize Wide Data in Parallel” on page 12-43 “Introduction to Feature Selection” on page 15-49 Introduced in R2012a

32-3011

32

Functions

lassoPlot Trace plot of lasso fit

Syntax lassoPlot(B) lassoPlot(B,FitInfo) lassoPlot(B,FitInfo,Name,Value) [ax,figh] = lassoPlot( ___ )

Description lassoPlot(B) creates a trace plot of the values in B against the L1 norm of B. lassoPlot(B,FitInfo) creates a plot with type depending on the data type of FitInfo and the value, if any, of the PlotType name-value pair. lassoPlot(B,FitInfo,Name,Value) creates a plot with additional options specified by one or more Name,Value pair arguments. [ax,figh] = lassoPlot( ___ ), for any previous input syntax, returns a handle ax to the plot axis, and a handle figh to the figure window.

Input Arguments B Coefficients of a sequence of regression fits, as returned from the lasso or lassoglm functions. B is a p-by-NLambda matrix, where p is the number of predictors, and each column of B is a set of coefficients lasso calculates using one Lambda penalty value. FitInfo Information controlling the plot: • FitInfo is a structure, especially as returned from lasso or lassoglm — lassoPlot creates a plot based on the PlotType name-value pair. • FitInfo is a vector — lassoPlot forms the x-axis of the plot from the values in FitInfo. The length of FitInfo must equal the number of columns of B. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Parent Axis in which to draw the plot. 32-3012

lassoPlot

Default: New plot PlotType Plot type when you specify a FitInfo vector or structure: PlotType

Plot

'L1'

lassoPlot creates the x-axis from the L1 norm of the coefficients in B. The x-axis at the top of the plot contains the degrees of freedom (df), meaning the number of nonzero coefficients of B.

'Lambda'

lassoPlot creates the x-axis from the Lambda field of FitInfo. The x-axis at the top of the plot contains the degrees of freedom (df), meaning the number of nonzero coefficients of B.

When you choose this value, FitInfo must be a structure. 'CV'

• For each Lambda, lassoPlot plots an estimate of the mean squared prediction error on new data for the model fitted by lasso with that value of Lambda.

When you • lassoPlot plots error bars for the estimates. choose this value, FitInfo must be a crossvalidated structure.

If you include a cross-validated FitInfo structure, lassoPlot also indicates two specific Lambda values with green and blue dashed lines. • A green, dashed line indicates the value of Lambda with a minimum cross-validated mean squared error (MSE). • A blue, dashed line indicates the greatest Lambda that is within one standard error of the minimum MSE. This Lambda value makes the sparsest model with relatively low MSE. To display the label for each plot in the legend of the figure, type legend('show') in the Command Window. Default: 'L1' PredictorNames String array or cell array of character vectors to label each coefficient of B. If the length of PredictorNames is less than the number of rows of B, the remaining labels are padded with default values. lassoPlot uses PredictorNames in FitInfo only if: • You created FitInfo with a call to lasso that included a PredictorNames name-value pair. • You call lassoPlot without a PredictorNames name-value pair. • You include FitInfo in your lassoPlot call. For an example, see “Lasso Plot with Default Plot Type” on page 32-3014. 32-3013

32

Functions

Default: {'B1','B2',...} XScale • 'linear' for linear x-axis • 'log' for logarithmic scaled x-axis Default: 'linear', except 'log' for the 'CV' plot type

Output Arguments ax Handle to the axis of the plot (see “Axes Appearance” (MATLAB)). figh Handle to the figure window (see “Special Object Identifiers” (MATLAB)).

Examples Lasso Plot with Default Plot Type Load the sample data load acetylene

Prepare the design matrix for lasso fit with interactions. X = [x1 x2 x3]; D = x2fx(X,'interaction'); D(:,1) = []; % No constant term

The x2fx function returns the quadratic model in the order of a constant term, linear terms and interaction terms: constant term, x1, x2, x3, x1.*x2, x1.*x3, and x2.*x3 Fit a regularized model of the data using lasso. B = lasso(D,y);

Plot the lasso fits with labeled coefficients by using the PredictorNames name-value pair. lassoPlot(B,'PredictorNames',{'x1','x2','x3','x1.*x2','x1.*x3','x2.*x3'}); legend('show','Location','NorthWest') % Show legend

32-3014

lassoPlot

Each line represents a trace of the values in B for a single predictor variable: x1, x2, x3, x1.*x2, x1.*x3, and x2.*x3. Display a data tip for the trace plot. A data tip appears when you hover over a data tip.

32-3015

32

Functions

A data tip displays these lines of information: the name of the selected coefficient with a fitted value, the L1 norm of a set of coefficients including the selected coefficient, and the index of the corresponding Lambda.

Lasso Plot with Lambda Plot Type Load the sample data. load acetylene

Prepare the data for lasso fit with interactions. X = [x1 x2 x3]; D = x2fx(X,'interaction'); D(:,1) = []; % No constant term

Fit a regularized model of the data with lasso. [B,FitInfo] = lasso(D,y);

Plot the fits with the Lambda plot type and logarithmic scaling. lassoPlot(B,FitInfo,'PlotType','Lambda','XScale','log');

32-3016

lassoPlot

Lasso Plot with Cross-Validated Fits Visually examine the cross-validated error of various levels of regularization. Load the sample data. load acetylene

Create a design matrix with interactions and no constant term. X = [x1 x2 x3]; D = x2fx(X,'interaction'); D(:,1) = []; % No constant term

Construct the lasso fit using 10-fold cross-validation. Include the FitInfo output so you can plot the result. rng default % For reproducibility [B,FitInfo] = lasso(D,y,'CV',10);

Plot the cross-validated fits. lassoPlot(B,FitInfo,'PlotType','CV'); legend('show') % Show legend

32-3017

32

Functions

The green circle and dotted line locate the Lambda with minimum cross-validation error. The blue circle and dotted line locate the point with minimum cross-validation error plus one standard deviation.

See Also lasso | lassoglm Topics “Lasso and Elastic Net” on page 11-109 Introduced in R2011b

32-3018

le

le Class: qrandstream Less than or equal relation for handles

Syntax h1 tol*max(1,max(nca.FeatureWeights)); % Fit a non-ARD GPR model using selected features. gpr = fitrgp(X(:,selidx),y,'Standardize',1,... 'KernelFunction','squaredexponential','Verbose',0); lossvals(i,k) = loss(gpr,Xvalid(:,selidx),yvalid); end end

Compute the average loss obtained from the folds for each λ value. Plot the mean loss versus the λ values. meanloss = mean(lossvals,2); figure; plot(lambdavals,meanloss,'ro-'); xlabel('Lambda'); ylabel('Loss (MSE)'); grid on;

32-3257

32

Functions

Find the λ value that produces the minimum loss value. [~,idx] = min(meanloss); bestlambda = lambdavals(idx) bestlambda = 0.0251

Perform feature selection for regression using the best λ value. Standardize the predictor values. nca2 = fsrnca(Xtrain,ytrain,'Standardize',1,'Lambda',bestlambda,... 'LossFunction','mad');

Plot the feature weights. figure() plot(nca.FeatureWeights,'ro')

32-3258

loss

Compute the loss using the new nca model on the test data, which is not used to select the features. L2 = loss(nca2,Xtest,ytest,'LossFunction','mad') L2 = 2.0560

Tuning the regularization parameter helps identify the relevant features and reduces the loss. Plot the predicted versus the actual response values in the test set. ypred = predict(nca2,Xtest); figure; plot(ypred,ytest,'bo');

32-3259

32

Functions

The predicted response values seem to be closer to the actual values as well. References [1] Harrison, D. and D.L., Rubinfeld. "Hedonic prices and the demand for clean air." J. Environ. Economics & Management. Vol.5, 1978, pp. 81-102. [2] Lichman, M. UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2013. https://archive.ics.uci.edu/ml.

See Also FeatureSelectionNCARegression | fsrnca | predict | refit Introduced in R2016b

32-3260

loss

loss Class: RegressionLinear Regression loss for linear regression models

Syntax L = loss(Mdl,X,Y) L = loss( ___ ,Name,Value)

Description L = loss(Mdl,X,Y) returns the mean squared error (MSE) for the linear regression model Mdl using predictor data in X and corresponding responses in Y. L contains an MSE for each regularization strength in Mdl. L = loss( ___ ,Name,Value) uses any of the previous syntaxes and additional options specified by one or more Name,Value pair arguments. For example, specify that columns in the predictor data correspond to observations or specify the regression loss function.

Input Arguments Mdl — Linear regression model RegressionLinear model object Linear regression model, specified as a RegressionLinear model object. You can create a RegressionLinear model object using fitrlinear. X — Predictor data full matrix | sparse matrix Predictor data, specified as an n-by-p full or sparse matrix. This orientation of X indicates that rows correspond to individual observations, and columns correspond to individual predictor variables. Note If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in computation time. The length of Y and the number of observations in X must be equal. Data Types: single | double Y — Response data numeric vector Response data, specified as an n-dimensional numeric vector. The length of Y and the number of observations in X must be equal. 32-3261

32

Functions

Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. LossFun — Loss function 'mse' (default) | 'epsiloninsensitive' | function handle Loss function, specified as the comma-separated pair consisting of 'LossFun' and a built-in loss function name or function handle. • The following table lists the available loss functions. Specify one using its corresponding value. Also, in the table, f x = xβ + b . • β is a vector of p coefficients. • x is an observation from p predictor variables. • b is the scalar bias. Value

Description

'epsiloninsensitive'

Epsilon-insensitive loss: ℓ y, f x = max 0, y − f x − ε

'mse'

MSE: ℓ y, f x = y − f x

2

'epsiloninsensitive' is appropriate for SVM learners only. • Specify your own function using function handle notation. Let n be the number of observations in X. Your function must have this signature lossvalue = lossfun(Y,Yhat,W)

where: • The output argument lossvalue is a scalar. • You choose the function name (lossfun). • Y is an n-dimensional vector of observed responses. loss passes the input argument Y in for Y. • Yhat is an n-dimensional vector of predicted responses, which is similar to the output of predict. • W is an n-by-1 numeric vector of observation weights. Specify your function using 'LossFun',@lossfun. Data Types: char | string | function_handle ObservationsIn — Predictor data observation dimension 'rows' (default) | 'columns' Predictor data observation dimension, specified as the comma-separated pair consisting of 'ObservationsIn' and 'columns' or 'rows'. 32-3262

loss

Note If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in optimizationexecution time. Weights — Observation weights numeric vector of positive values Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector of positive values. If you supply weights, loss computes the weighted classification loss. Let n be the number of observations in X. • numel(Weights) must be n. • By default, Weights is ones(n,1). Data Types: double | single

Output Arguments L — Regression losses numeric scalar | numeric row vector Regression losses, returned as a numeric scalar or row vector. The interpretation of L depends on Weights and LossFun. L is the same size as Mdl.Lambda. L(j) is the regression loss of the linear regression model trained using the regularization strength Mdl.Lambda(j). Note If Mdl.FittedLoss is 'mse', then the loss term in the objective function is half of the MSE. loss returns the MSE by default. Therefore, if you use loss to check the resubstitution (training) error, then there is a discrepancy between the MSE and optimization results that fitrlinear returns.

Examples Estimate Test-Sample Mean Squared Error Simulate 10000 observations from this model y = x100 + 2x200 + e . •

X = x1, . . . , x1000 is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

• e is random normal error with mean 0 and standard deviation 0.3. rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

32-3263

32

Functions

Train a linear regression model. Reserve 30% of the observations as a holdout sample. CVMdl = fitrlinear(X,Y,'Holdout',0.3); Mdl = CVMdl.Trained{1} Mdl = RegressionLinear ResponseName: ResponseTransform: Beta: Bias: Lambda: Learner:

'Y' 'none' [1000x1 double] -0.0066 1.4286e-04 'svm'

Properties, Methods

CVMdl is a RegressionPartitionedLinear model. It contains the property Trained, which is a 1by-1 cell array holding a RegressionLinear model that the software trained using the training set. Extract the training and test data from the partition definition. trainIdx = training(CVMdl.Partition); testIdx = test(CVMdl.Partition);

Estimate the training- and test-sample MSE. mseTrain = loss(Mdl,X(trainIdx,:),Y(trainIdx)) mseTrain = 0.1496 mseTest = loss(Mdl,X(testIdx,:),Y(testIdx)) mseTest = 0.1798

Because there is one regularization strength in Mdl, mseTrain and mseTest are numeric scalars.

Specify Custom Regression Loss Simulate 10000 observations from this model y = x100 + 2x200 + e . •

X = x1, . . . , x1000 is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

• e is random normal error with mean 0 and standard deviation 0.3. rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1); X = X'; % Put observations in columns for faster training

Train a linear regression model. Reserve 30% of the observations as a holdout sample. 32-3264

loss

CVMdl = fitrlinear(X,Y,'Holdout',0.3,'ObservationsIn','columns'); Mdl = CVMdl.Trained{1} Mdl = RegressionLinear ResponseName: ResponseTransform: Beta: Bias: Lambda: Learner:

'Y' 'none' [1000x1 double] -0.0066 1.4286e-04 'svm'

Properties, Methods

CVMdl is a RegressionPartitionedLinear model. It contains the property Trained, which is a 1by-1 cell array holding a RegressionLinear model that the software trained using the training set. Extract the training and test data from the partition definition. trainIdx = training(CVMdl.Partition); testIdx = test(CVMdl.Partition);

Create an anonymous function that measures Huber loss (δ = 1), that is, n

L=

1 wj ℓ j , ∑wj j ∑ =1

where 2

ℓj =

0 . 5e j ;

ej ≤ 1

e j − 0 . 5;

ej > 1

.

e j is the residual for observation j. Custom loss functions must be written in a particular form. For rules on writing a custom loss function, see the 'LossFun' name-value pair argument. huberloss = @(Y,Yhat,W)sum(W.*((0.5*(abs(Y-Yhat)1).*abs(Y-Yhat)-0.5)))/sum(W);

Estimate the training set and test set regression loss using the Huber loss function. eTrain = loss(Mdl,X(:,trainIdx),Y(trainIdx),'LossFun',huberloss,... 'ObservationsIn','columns') eTrain = -0.4186 eTest = loss(Mdl,X(:,testIdx),Y(testIdx),'LossFun',huberloss,... 'ObservationsIn','columns') eTest = -0.4010

32-3265

32

Functions

Find Good Lasso Penalty Using Regression Loss Simulate 10000 observations from this model y = x100 + 2x200 + e . •

X = {x1, . . . , x1000} is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

• e is random normal error with mean 0 and standard deviation 0.3. rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1); −4

Create a set of 15 logarithmically-spaced regularization strengths from 10

−1

through 10

.

Lambda = logspace(-4,-1,15);

Hold out 30% of the data for testing. Identify the test-sample indices. cvp = cvpartition(numel(Y),'Holdout',0.30); idxTest = test(cvp);

Train a linear regression model using lasso penalties with the strengths in Lambda. Specify the regularization strengths, optimizing the objective function using SpaRSA, and the data partition. To increase execution speed, transpose the predictor data and specify that the observations are in columns. X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','Lambda',Lambda,... 'Solver','sparsa','Regularization','lasso','CVPartition',cvp); Mdl1 = CVMdl.Trained{1}; numel(Mdl1.Lambda) ans = 15

Mdl1 is a RegressionLinear model. Because Lambda is a 15-dimensional vector of regularization strengths, you can think of Mdl1 as 15 trained models, one for each regularization strength. Estimate the test-sample mean squared error for each regularized model. mse = loss(Mdl1,X(:,idxTest),Y(idxTest),'ObservationsIn','columns');

Higher values of Lambda lead to predictor variable sparsity, which is a good quality of a regression model. Retrain the model using the entire data set and all options used previously, except the datapartition specification. Determine the number of nonzero coefficients per model. Mdl = fitrlinear(X,Y,'ObservationsIn','columns','Lambda',Lambda,... 'Solver','sparsa','Regularization','lasso'); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the MSE and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale. 32-3266

loss

figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(mse),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} MSE') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') hold off

Select the index or indices of Lambda that balance minimal classification error and predictor-variable sparsity (for example, Lambda(11)). idx = 11; MdlFinal = selectModels(Mdl,idx);

MdlFinal is a trained RegressionLinear model object that uses Lambda(11) as a regularization strength.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. You can use models trained on either in-memory or tall data with this function. 32-3267

32

Functions

For more information, see “Tall Arrays” (MATLAB).

See Also RegressionLinear | fitrlinear | predict Introduced in R2016a

32-3268

lowerparams

lowerparams Lower Pareto tail parameters

Syntax params = lowerparams(pd)

Description params = lowerparams(pd) returns the two-element vector params, which includes the shape and scale parameters of the generalized Pareto distribution (GPD) in the lower tail of pd. lowerparams does not return the location parameter of the GPD. The location parameter is the quantile value corresponding to the lower tail cumulative probability. Use the boundary function to return the location parameter.

Examples Parameters of Lower Pareto Tail Generate a sample data set and fit a piecewise distribution with Pareto tails to the data by using paretotails. Find the distribution parameters of the lower Pareto tail by using the object function lowerparams. Generate a sample data set containing 20% outliers. rng('default'); % For reproducibility left_tail = -exprnd(1,100,1); right_tail = exprnd(5,100,1); center = randn(800,1); x = [left_tail;center;right_tail];

Create a paretotails object by fitting a piecewise distribution to x. Specify the boundaries of the tails using the lower and upper tail cumulative probabilities so that a fitted object consists of the empirical distribution for the middle 80% of the data set and GPDs for the lower and upper 10% of the data set. pd = paretotails(x,0.1,0.9) pd = Piecewise distribution with 3 segments -Inf < x < -1.33251 (0 < p < 0.1): lower tail, GPD(-0.0063504,0.567017) -1.33251 < x < 1.80149 (0.1 < p < 0.9): interpolated empirical cdf 1.80149 < x < Inf (0.9 < p < 1): upper tail, GPD(0.24874,3.00974)

Return the shape and scale parameters of the fitted GPD of the lower tail by using the lowerparams function. params = lowerparams(pd)

32-3269

32

Functions

params = 1×2 -0.0064

0.5670

You can also get the lower Pareto tail parameters by using the LowerParameters property. Access the LowerParameters property by using dot notation. pd.LowerParameters ans = 1×2 -0.0064

0.5670

The location parameter of the GPD is equal to the quantile value of the lower tail cumulative probability. Return the location parameter by using the boundary function. [p,q] = boundary(pd) p = 2×1 0.1000 0.9000 q = 2×1 -1.3325 1.8015

The values in p are the cumulative probabilities at the boundaries, and the values in q are the corresponding quantiles. q(2) is the location parameter of the GPD of the lower tail. Use the upperparams function or the UpperParameters property to get the upper Pareto tail parameters.

Input Arguments pd — Piecewise distribution with Pareto tails paretotails object Piecewise distribution with Pareto tails, specified as a paretotails object.

See Also boundary | gpfit | nsegments | paretotails | segment | upperparams Topics “Fit a Nonparametric Distribution with Pareto Tails” on page 5-42 “Nonparametric and Empirical Probability Distributions” on page 5-29 “Nonparametric Estimates of Cumulative Distribution Functions and Their Inverses” “Generalized Pareto Distribution” on page B-59 32-3270

lowerparams

Introduced in R2007a

32-3271

32

Functions

lt Class: qrandstream Less than relation for handles

Syntax h1 < h2

Description h1 < h2 performs element-wise comparisons between handle arrays h1 and h2. h1 and h2 must be of the same dimensions unless one is a scalar. The result is a logical array of the same dimensions, where each element is an element-wise < result. If one of h1 or h2 is scalar, scalar expansion is performed and the result will match the dimensions of the array that is not scalar. tf = lt(h1, h2) stores the result in a logical array of the same dimensions.

See Also eq | ge | gt | le | ne | qrandstream

32-3272

lsline

lsline Add least-squares line to scatter plot

Syntax lsline lsline(ax) h = lsline( ___ )

Description lsline superimposes a least-squares line on each scatter plot in the current axes. lsline ignores data points that are connected with solid, dashed, or dash-dot lines ('-', '--', or '.-') because it does not consider them to be scatter plots. To produce scatter plots, use the MATLAB scatter and plot functions. lsline(ax) superimposes a least-squares line on the scatter plot in the axes specified by ax instead of the current axes (gca). h = lsline( ___ ) returns a column vector of least-squares line objects h using any of the previous syntaxes. Use h to modify the properties of a specific least-squares line after you create it. For a list of properties, see Line Properties.

Examples Plot Least-Squares Line Generate three sets of sample data and plot each set on the same figure. x = 1:10; rng default; figure;

% For reproducibility

y1 = x + randn(1,10); scatter(x,y1,25,'b','*') hold on y2 = 2*x + randn(1,10); plot(x,y2,'mo') y3 = 3*x + randn(1,10); plot(x,y3,'rx:')

32-3273

32

Functions

Add a least-squares line for each set of sample data. lsline

32-3274

lsline

Specify Axes for Least-Squares and Reference Lines Define the x-variable and two different y-variables to use for the plots. rng default % For reproducibility x = 1:10; y1 = x + randn(1,10); y2 = 2*x + randn(1,10);

Define ax1 as the top half of the figure, and ax2 as the bottom half of the figure. Create the first scatter plot on the top axis using y1, and the second scatter plot on the bottom axis using y2. figure ax1 = subplot(2,1,1); ax2 = subplot(2,1,2); scatter(ax1,x,y1) scatter(ax2,x,y2)

32-3275

32

Functions

Superimpose a least-squares line on the top plot, and a reference line at the mean of the y2 values in the bottom plot. lsline(ax1) % This is equivalent to refline(ax1) mu = mean(y2); refline(ax2,[0 mu])

32-3276

lsline

Use Least-Squares Line Object to Modify Line Properties Define the x-variable and two different y-variables to use for the plots. rng default % For reproducibility x = 1:10; y1 = x + randn(1,10); y2 = 2*x + randn(1,10);

Define ax1 as the top half of the figure, and ax2 as the bottom half of the figure. Create the first scatter plot on the top axis using y1, and the second scatter plot on the bottom axis using y2. figure ax1 = subplot(2,1,1); ax2 = subplot(2,1,2); scatter(ax1,x,y1) scatter(ax2,x,y2)

Superimpose a least-squares line on the top plot. Then, use the least-squares line object h1 to change the line color to red. h1 = lsline(ax1); h1.Color = 'r';

32-3277

32

Functions

Superimpose a least-squares line on the bottom plot. Then, use the least-squares line object h2 to increase the line width to 5. h2 = lsline(ax2); h2.LineWidth = 5;

Input Arguments ax — Target axes gca (default) | axes object Target axes, specified as an axes object. If you do not specify the axes and if the current axes are Cartesian axes, then the lsline function uses the current axes.

Output Arguments h — One or more least-squares line objects scalar | vector One or more least-squares line objects, returned as a scalar or a vector. These objects are unique identifiers, which you can use to query and modify properties of a specific least-squares line. For a list of properties, see Chart Line.

32-3278

lsline

See Also gca | gline | plot | refcurve | refline | scatter Introduced before R2006a

32-3279

32

Functions

mad Mean or median absolute deviation

Syntax y y y y y

= = = = =

mad(X) mad(X,flag) mad(X,flag,'all') mad(X,flag,dim) mad(X,flag,vecdim)

Description y = mad(X) returns the mean absolute deviation of the values in X. • If X is a vector, then mad returns the mean or median absolute deviation of the values in X. • If X is a matrix, then mad returns a row vector containing the mean or median absolute deviation of each column of X. • If X is a multidimensional array, then mad operates along the first nonsingleton dimension of X. y = mad(X,flag) specifies whether to compute the mean absolute deviation (flag = 0, the default) or the median absolute deviation (flag = 1). y = mad(X,flag,'all') returns the mean or median absolute deviation of all elements of X. y = mad(X,flag,dim) returns the mean or median absolute deviation along the operating dimension dim of X. y = mad(X,flag,vecdim) returns the mean or median absolute deviation over the dimensions specified in the vector vecdim. For example, if X is a 2-by-3-by-4 array, then mad(X,0,[1 2]) returns a 1-by-1-by-4 array. Each element of the output array is the mean absolute deviation of the elements on the corresponding page of X.

Examples Compare Robustness of Scale Estimates Compare the robustness of the standard deviation, mean absolute deviation, and median absolute deviation in the presence of outliers. Create a data set x of normally distributed data. Create another data set xo that contains the elements of x and an additional outlier. rng('default') % For reproducibility x = normrnd(0,1,1,50); xo = [x 10];

Compute the ratio of the standard deviations of the two data sets. 32-3280

mad

r1 = std(xo)/std(x) r1 = 1.4633

Compute the ratio of the mean absolute deviations of the two data sets. r2 = mad(xo)/mad(x) r2 = 1.1833

Compute the ratio of the median absolute deviations of the two data sets. r3 = mad(xo,1)/mad(x,1) r3 = 1.0336

In this case, the median absolute deviation is less influenced by the outlier compared to the other two scale estimates.

Mean and Median Absolute Deviations of All Values Find the mean and median absolute deviations of all the values in an array. Create a 3-by-5-by-2 array X and add an outlier. X = reshape(1:30,[3 5 2]); X(6) = 100 X = X(:,:,1) = 1 2 3

4 5 100

7 8 9

10 11 12

13 14 15

22 23 24

25 26 27

28 29 30

X(:,:,2) = 16 17 18

19 20 21

Find the mean and median absolute deviations of the elements in X. meandev = mad(X,0,'all') meandev = 10.1178 mediandev = mad(X,1,'all') mediandev = 7.5000

meandev is the mean absolute deviation of all the elements in X, and mediandev is the median absolute deviation of all the elements in X.

32-3281

32

Functions

Find Median Absolute Deviation Along Given Dimension Find the median absolute deviation along different dimensions for a multidimensional array. Set the random seed for reproducibility of the results. rng('default')

Create a 1-by-3-by-2 array of random numbers. X = randn([1,3,2]) X = X(:,:,1) = 0.5377

1.8339

-2.2588

0.3188

-1.3077

X(:,:,2) = 0.8622

Find the median absolute deviation of X along the default dimension. Y2 = mad(X,1) % Flag is set to 1 for the median absolute deviation Y2 = Y2(:,:,1) = 1.2962 Y2(:,:,2) = 0.5434

By default, mad operates along the first dimension of X whose size does not equal 1. In this case, this dimension is the second dimension of X. Therefore, Y2 is a 1-by-1-by-2 array. Find the median absolute deviation of X along the third dimension. Y3 = mad(X,1,3) Y3 = 1×3 0.1623

0.7576

Y3 is a 1-by-3 matrix.

32-3282

0.4756

mad

Find Mean Absolute Deviation Along Vector of Dimensions Find the mean absolute deviation over multiple dimensions by using the vecdim input argument. Set the random seed for reproducibility of the results. rng('default')

Create a 4-by-3-by-2 array of random numbers. X = randn([4 3 2]) X = X(:,:,1) = 0.5377 1.8339 -2.2588 0.8622

0.3188 -1.3077 -0.4336 0.3426

3.5784 2.7694 -1.3499 3.0349

-0.1241 1.4897 1.4090 1.4172

0.6715 -1.2075 0.7172 1.6302

X(:,:,2) = 0.7254 -0.0631 0.7147 -0.2050

Find the mean absolute deviation of each page of X by specifying the first and second dimensions. ypage = mad(X,0,[1 2]) ypage = ypage(:,:,1) = 1.4626 ypage(:,:,2) = 0.6652

For example, ypage(:,:,2) is the mean absolute deviation of all the elements in X(:,:,2), and is equivalent to specifying mad(X(:,:,2),0,'all'). Find the mean absolute deviation of the elements in each X(:,i,:) slice by specifying the first and third dimensions. ycol = mad(X,0,[1 3]) ycol = 1×3 0.8330

0.7872

1.5227

For example, ycol(3) is the mean absolute deviation of all the elements in X(:,3,:), and is equivalent to specifying mad(X(:,3,:),0,'all'). 32-3283

32

Functions

Input Arguments X — Input data vector | matrix | multidimensional array Input data that represents a sample from a population, specified as a vector, matrix, or multidimensional array. • If X is a vector, then mad returns the mean or median absolute deviation of the values in X. • If X is a matrix, then mad returns a row vector containing the mean or median absolute deviation of each column of X. • If X is a multidimensional array, then mad operates along the first nonsingleton dimension of X. To specify the operating dimension when X is a matrix or an array, use the dim input argument. mad treats NaNs as missing values and removes them. Data Types: single | double flag — Indicator for the type of deviation 0 (default) | 1 Indicator for the type of deviation, specified as 0 or 1. • If flag is 0 (default), then mad computes the mean absolute deviation, mean(abs(X – mean(X))). • If flag is 1, then mad computes the median absolute deviation, median(abs(X – median(X))). Data Types: single | double | logical dim — Dimension positive integer Dimension along which to operate, specified as a positive integer. If you do not specify a value for dim, then the default is the first dimension of X whose size does not equal 1. Consider the mean absolute deviation of a matrix X: • If dim is equal to 1, then mad(X) returns a row vector that contains the mean absolute deviation of each column in X. • If dim is equal to 2, then mad(X) returns a column vector that contains the mean absolute deviation of each row in X. Data Types: single | double vecdim — Vector of dimensions positive integer vector Vector of dimensions, specified as a positive integer vector. Each element of vecdim represents a dimension of the input array X. The output y has length 1 in the specified operating dimensions. The other dimension lengths are the same for X and y. 32-3284

mad

For example, if X is a 2-by-3-by-3 array, then mad(X,0,[1 2]) returns a 1-by-1-by-3 array. Each element of the output array is the mean absolute deviation of the elements on the corresponding page of X.

Data Types: single | double

Output Arguments y — Mean or median absolute deviation scalar | vector | matrix | multidimensional array Mean or median absolute deviation, returned as a scalar, vector, matrix, or multidimensional array. If flag is 0 (default), then y is the mean absolute deviation of the values in X, mean(abs(X – mean(X))). If flag is 1, then y is the median absolute deviation of the values in X, median(abs(X – median(X))).

Tips • For normally distributed data, multiply mad by one of the following factors to obtain an estimate of the normal scale parameter σ: • sigma = 1.253 * mad(X,0) — For mean absolute deviation • sigma = 1.4826 * mad(X,1) — For median absolute deviation

References [1] Mosteller, F., and J. Tukey. Data Analysis and Regression. Upper Saddle River, NJ: Addison-Wesley, 1977. [2] Sachs, L. Applied Statistics: A Handbook of Techniques. New York: Springer-Verlag, 1984, p. 253.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The 'all' and vecdim input arguments are not supported. 32-3285

32

Functions

• Empty dim input is not supported. • The dim input argument must be a compile-time constant. • If you do not specify the dim input argument, the working (or operating) dimension can be different in the generated code. As a result, run-time errors can occur. For more details, see “Automatic dimension restriction” (MATLAB Coder). For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The 'all' and vecdim input arguments are not supported. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also iqr | range | std Introduced before R2006a

32-3286

mahal

mahal Mahalanobis distance

Syntax d2 = mahal(Y,X)

Description d2 = mahal(Y,X) returns the squared Mahalanobis distance on page 32-3289 of each observation in Y to the reference samples in X.

Examples Compare Mahalanobis and Squared Euclidean Distances Generate a correlated bivariate sample data set. rng('default') % For reproducibility X = mvnrnd([0;0],[1 .9;.9 1],1000);

Specify four observations that are equidistant from the mean of X in Euclidean distance. Y = [1 1;1 -1;-1 1;-1 -1];

Compute the Mahalanobis distance of each observation in Y to the reference samples in X. d2_mahal = mahal(Y,X) d2_mahal = 4×1 1.1095 20.3632 19.5939 1.0137

Compute the squared Euclidean distance of each observation in Y from the mean of X . d2_Euclidean = sum((Y-mean(X)).^2,2) d2_Euclidean = 4×1 2.0931 2.0399 1.9625 1.9094

Plot X and Y by using scatter and use marker color to visualize the Mahalanobis distance of Y to the reference samples in X. 32-3287

32

Functions

scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10 hold on scatter(Y(:,1),Y(:,2),100,d2_mahal,'o','filled') hb = colorbar; ylabel(hb,'Mahalanobis Distance') legend('X','Y','Location','best')

All observations in Y ([1,1], [-1,-1,], [1,-1], and [-1,1]) are equidistant from the mean of X in Euclidean distance. However, [1,1] and [-1,-1] are much closer to X than [1,-1] and [-1,1] in Mahalanobis distance. Because Mahalanobis distance considers the covariance of the data and the scales of the different variables, it is useful for detecting outliers.

Input Arguments Y — Data n-by-m numeric matrix Data, specified as an n-by-m numeric matrix, where n is the number of observations and m is the number of variables in each observation. X and Y must have the same number of columns, but can have different numbers of rows. Data Types: single | double X — Reference samples p-by-m numeric matrix 32-3288

mahal

Reference samples, specified as a p-by-m numeric matrix, where p is the number of samples and m is the number of variables in each sample. X and Y must have the same number of columns, but can have different numbers of rows. X must have more rows than columns. Data Types: single | double

Output Arguments d2 — Squared Mahalanobis distance n-by-1 numeric vector Squared Mahalanobis distance on page 32-3289 of each observation in Y to the reference samples in X, returned as an n-by-1 numeric vector, where n is the number of observations in X.

More About Mahalanobis Distance The Mahalanobis distance is a measure between a sample point and a distribution. The Mahalanobis distance from a vector y to a distribution with mean μ and covariance Σ is −1

d = (y − μ) ∑

(y − μ)′ .

This distance represents how far y is from the mean in number of standard deviations. mahal returns the squared Mahalanobis distance d2 from an observation in Y to the reference samples in X. In the mahal function, μ and Σ are the sample mean and covariance of the reference samples, respectively.

See Also mahal | pdist Introduced before R2006a

32-3289

32

Functions

mahal Mahalanobis distance to class means

Syntax M = mahal(obj,X) M = mahal(obj,X,Name,Value)

Description M = mahal(obj,X) returns the squared Mahalanobis distances from observations in X to the class means in obj. M = mahal(obj,X,Name,Value) computes the squared Mahalanobis distance with additional options specified by one or more Name,Value pair arguments.

Input Arguments obj Discriminant analysis classifier of class ClassificationDiscriminant or CompactClassificationDiscriminant, typically constructed with fitcdiscr. X Numeric matrix of size n-by-p, where p is the number of predictors in obj, and n is any positive integer. mahal computes the Mahalanobis distances from the rows of X to each of the K means of the classes in obj. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. ClassLabels Class labels consisting of n elements of obj.Y, where n is the number of rows of X.

Output Arguments M Size and meaning of output M depends on whether the ClassLabels name-value pair is present: • No ClassLabels — M is a numeric matrix of size n-by-K, where K is the number of classes in obj, and n is the number of rows in X. M(i,j) is the squared Mahalanobis distance from the ith row of X to the mean of class j. • ClassLabels exists — M is a column vector with n elements. M(i) is the squared Mahalanobis distance from the ith row of X to the mean for the class of the ith element of ClassLabels. 32-3290

mahal

Examples Find the Mahalanobis distances from the mean of the Fisher iris data to the class means, using distinct covariance matrices for each class: load fisheriris obj = fitcdiscr(meas,species,... 'DiscrimType','quadratic'); mahadist = mahal(obj,mean(meas)) mahadist = 220.0667

5.0254

30.5804

More About Mahalanobis Distance The Mahalanobis distance d(x,y) between n-dimensional points x and y, with respect to a given n-by-n covariance matrix S, is d(x, y) =

T −1

x−y S

x−y .

See Also CompactClassificationDiscriminant | fitcdiscr | gmdistribution | mahal Topics “Discriminant Analysis Classification” on page 20-2

32-3291

32

Functions

mahal Mahalanobis distance to Gaussian mixture component

Syntax d2 = mahal(gm,X)

Description d2 = mahal(gm,X) returns the squared Mahalanobis distance of each observation in X to each Gaussian mixture component in gm.

Examples Measure Mahalanobis Distance Generate random variates that follow a mixture of two bivariate Gaussian distributions by using the mvnrnd function. Fit a Gaussian mixture model (GMM) to the generated data by using the fitgmdist function, and then compute Mahalanobis distances between the generated data and the mixture components of the fitted GMM. Define the distribution parameters (means and covariances) of two bivariate Gaussian mixture components. rng('default') % For reproducibility mu1 = [1 2]; % Mean of the 1st component sigma1 = [2 0; 0 .5]; % Covariance of the 1st component mu2 = [-3 -5]; % Mean of the 2nd component sigma2 = [1 0; 0 1]; % Covariance of the 2nd component

Generate an equal number of random variates from each component, and combine the two sets of random variates. r1 = mvnrnd(mu1,sigma1,1000); r2 = mvnrnd(mu2,sigma2,1000); X = [r1; r2];

The combined data set X contains random variates following a mixture of two bivariate Gaussian distributions. Fit a two-component GMM to X. gm = fitgmdist(X,2) gm = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: -2.9617 -4.9727

32-3292

mahal

Component 2: Mixing proportion: 0.500000 Mean: 0.9539 2.0261

fitgmdist fits a GMM to X using two mixture components. The means of Component 1 and Component 2 are [-2.9617,-4.9727] and [0.9539,2.0261], which are close to mu2 and mu1, respectively. Compute the Mahalanobis distance of each point in X to each component of gm. d2 = mahal(gm,X);

Plot X by using scatter and use marker color to visualize the Mahalanobis distance to Component 1. scatter(X(:,1),X(:,2),10,d2(:,1),'.') % Scatter plot with points of size 10 c = colorbar; ylabel(c,'Mahalanobis Distance to Component 1')

Input Arguments gm — Gaussian mixture distribution gmdistribution object Gaussian mixture distribution, also called Gaussian mixture model (GMM), specified as a gmdistribution object. 32-3293

32

Functions

You can create a gmdistribution object using gmdistribution or fitgmdist. Use the gmdistribution function to create a gmdistribution object by specifying the distribution parameters. Use the fitgmdist function to fit a gmdistribution model to data given a fixed number of components. X — Data n-by-m numeric matrix Data, specified as an n-by-m numeric matrix, where n is the number of observations and m is the number of variables in each observation. If a row of X contains NaNs, then mahal excludes the row from the computation. The corresponding value in d2 is NaN. Data Types: single | double

Output Arguments d2 — Squared Mahalanobis distance n-by-k numeric matrix Squared Mahalanobis distance of each observation in X to each Gaussian mixture component in gm, returned as an n-by-k numeric matrix, where n is the number of observations in X and k is the number of mixture components in gm. d2(i,j) is the squared distance of observation i to the jth Gaussian mixture component.

More About Mahalanobis Distance The Mahalanobis distance is a measure between a sample point and a distribution. The Mahalanobis distance from a vector x to a distribution with mean μ and covariance Σ is −1

d = (x − μ) ∑

(x − μ)′ .

This distance represents how far x is from the mean in number of standard deviations. mahal returns the squared Mahalanobis distance d2 from an observation in X to a mixture component in gm.

See Also cluster | fitgmdist | gmdistribution | mahal | posterior Topics “Cluster Using Gaussian Mixture Model” on page 16-39 “Cluster Gaussian Mixture Data Using Hard Clustering” on page 16-46 “Cluster Gaussian Mixture Data Using Soft Clustering” on page 16-52 Introduced in R2007b 32-3294

maineffectsplot

maineffectsplot Main effects plot for grouped data

Syntax maineffectsplot(Y,GROUP) maineffectsplot(Y,GROUP,param1,val1,param2,val2,...) [figh,AXESH] = maineffectsplot(...)

Description maineffectsplot(Y,GROUP) displays main effects plots for the group means of matrix Y with groups defined by entries in GROUP, which can be a cell array or a matrix. Y is a numeric matrix or vector. If Y is a matrix, the rows represent different observations and the columns represent replications of each observation. If GROUP is a cell array, then each cell of GROUP must contain a grouping variable that is a categorical variable, numeric vector, character matrix, string array, or single-column cell array of character vectors. If GROUP is a matrix, then its columns represent different grouping variables. Each grouping variable must have the same number of rows as Y. The number of grouping variables must be greater than 1. The display has one subplot per grouping variable, with each subplot showing the group means of Y as a function of one grouping variable. maineffectsplot(Y,GROUP,param1,val1,param2,val2,...) specifies one or more of the following name/value pairs: • 'varnames' — Grouping variable names in a character matrix, a string array, or a cell array of character vectors, one per grouping variable. Default names are 'X1', 'X2', ... . • 'statistic' — Values that indicate whether the group mean or the group standard deviation should be plotted. Use 'mean' or 'std'. The default is 'mean'. If the value is 'std', Y is required to have multiple columns. • 'parent' — A handle to the figure window for the plots. The default is the current figure window. [figh,AXESH] = maineffectsplot(...) returns the handle figh to the figure window and an array of handles AXESH to the subplot axes.

Examples Main Effects Plot Load the sample data. load carsmall;

Display main effects plots for car weight with two grouping variables, model year and number of cylinders. maineffectsplot(Weight,{Model_Year,Cylinders}, ... 'varnames',{'Model Year','# of Cylinders'})

32-3295

32

Functions

See Also interactionplot | multivarichart Topics “Grouping Variables” on page 2-45 Introduced in R2006b

32-3296

ClassificationDiscriminant.make

ClassificationDiscriminant.make Class: ClassificationDiscriminant (Not Recommended) Construct discriminant analysis classifier from parameters Note ClassificationDiscriminant.make is not recommended. Use makecdiscr instead.

Syntax cobj = ClassificationDiscriminant.make(Mu,Sigma) cobj = ClassificationDiscriminant.make(Mu,Sigma,Name,Value)

Description cobj = ClassificationDiscriminant.make(Mu,Sigma) constructs a compact discriminant analysis classifier from the class means Mu and covariance matrix Sigma. cobj = ClassificationDiscriminant.make(Mu,Sigma,Name,Value) constructs a compact classifier with additional options specified by one or more Name,Value pair arguments.

Input Arguments Mu — Class means matrix of scalar values Class means, specified as a K-by-p matrix of scalar values class means of size. K is the number of classes, and p is the number of predictors. Each row of Mu represents the mean of the multivariate normal distribution of the corresponding class. The class indices are in the ClassNames attribute. Data Types: single | double Sigma — Within-class covariance matrix of scalar values Within-class covariance, specified as a matrix of scalar values. • For a linear discriminant, Sigma is a symmetric, positive semidefinite matrix of size p-by-p, where p is the number of predictors. • For a quadratic discriminant, Sigma is an array of size p-by-p-by-K, where K is the number of classes. For each i, Sigma(:,:,i) is a symmetric, positive semidefinite matrix. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. 32-3297

32

Functions

ClassNames — Class names numeric vector | categorical vector | logical vector | character array | cell array of character vectors Class names as ordered in Mu, specified as the comma-separated pair consisting of 'ClassNames' and an array containing grouping variables. Use any data type for a grouping variable, including numeric vector, categorical vector, logical vector, character array, or cell array of character vectors. ClassNames names the classes, as ordered in Mu. Default is 1:K, where K is the number of classes (the number of rows of Mu). Data Types: single | double | logical | char | cell Cost — Cost of misclassification square matrix | structure Cost of misclassification, specified as the comma-separated pair consisting of 'Cost' and a square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. Alternatively, Cost can be a structure S having two fields: S.ClassNames containing the group names as a variable of the same type as y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. Data Types: single | double | struct PredictorNames — Predictor variable names {'X1','X2',...} (default) | cell array of character vectors Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a cell array of character vectors containing the names for the predictor variables, in the order in which they appear in X. Data Types: cell Prior — Prior probabilities 'uniform' (default) | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and one of the following. • 'uniform', a character vector meaning all class prior probabilities are equal. • A vector containing one scalar value for each class. • A structure S with two fields: • S.ClassNames containing the class names as a variable of the same type as ClassNames. • S.ClassProbs containing a vector of corresponding probabilities. Data Types: single | double | struct ResponseName — Response variable name 'Y' (default) | character vector Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector containing the name of the response variable y. 32-3298

ClassificationDiscriminant.make

Example: 'ResponseName','Response' Data Types: char

Output Arguments cobj — Discriminant analysis classifier discriminant analysis classifier object Discriminant analysis classifier, returned as a discriminant analysis classifier object of class CompactClassificationDiscriminant. You can use the predict method to predict classification labels for new data.

Examples Construct a Compact Linear Discriminant Analysis Classifier Construct a compact linear discriminant analysis classifier from the means and covariances of the Fisher iris data. load fisheriris mu(1,:) = mean(meas(1:50,:)); mu(2,:) = mean(meas(51:100,:)); mu(3,:) = mean(meas(101:150,:)); mm1 = repmat(mu(1,:),50,1); mm2 = repmat(mu(2,:),50,1); mm3 = repmat(mu(3,:),50,1); cc = meas; cc(1:50,:) = cc(1:50,:) - mm1; cc(51:100,:) = cc(51:100,:) - mm2; cc(101:150,:) = cc(101:150,:) - mm3; sigstar = cc' * cc / 147; % unbiased estimator of sigma cpct = ClassificationDiscriminant.make(mu,sigstar,... 'ClassNames',{'setosa','versicolor','virginica'}) cpct = classreg.learning.classif.CompactClassificationDiscriminant PredictorNames: {'x1' 'x2' 'x3' 'x4'} ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' DiscrimType: 'linear' Mu: [3x4 double] Coeffs: [3x3 struct] Properties, Methods

See Also CompactClassificationDiscriminant | compact | fitcdiscr | makecdiscr 32-3299

32

Functions

Topics “Discriminant Analysis Classification” on page 20-2

32-3300

makecdiscr

makecdiscr Construct discriminant analysis classifier from parameters

Syntax cobj = makecdiscr(Mu,Sigma) cobj = makecdiscr(Mu,Sigma,Name,Value)

Description cobj = makecdiscr(Mu,Sigma) constructs a compact discriminant analysis classifier from the class means Mu and covariance matrix Sigma. cobj = makecdiscr(Mu,Sigma,Name,Value) constructs a compact classifier with additional options specified by one or more name-value pair arguments. For example, you can specify the cost of misclassification or the prior probabilities for each class.

Examples Construct a Compact Linear Discriminant Analysis Classifier Construct a compact linear discriminant analysis classifier from the means and covariances of the Fisher iris data. load fisheriris mu(1,:) = mean(meas(1:50,:)); mu(2,:) = mean(meas(51:100,:)); mu(3,:) = mean(meas(101:150,:)); mm1 = repmat(mu(1,:),50,1); mm2 = repmat(mu(2,:),50,1); mm3 = repmat(mu(3,:),50,1); cc = meas; cc(1:50,:) = cc(1:50,:) - mm1; cc(51:100,:) = cc(51:100,:) - mm2; cc(101:150,:) = cc(101:150,:) - mm3; sigstar = cc' * cc / 147; % unbiased estimator of sigma cpct = makecdiscr(mu,sigstar,... 'ClassNames',{'setosa','versicolor','virginica'}) cpct = classreg.learning.classif.CompactClassificationDiscriminant PredictorNames: {'x1' 'x2' 'x3' 'x4'} ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' DiscrimType: 'linear' Mu: [3x4 double] Coeffs: [3x3 struct]

32-3301

32

Functions

Properties, Methods

Input Arguments Mu — Class means matrix of scalar values Class means, specified as a K-by-p matrix of scalar values class means of size. K is the number of classes, and p is the number of predictors. Each row of Mu represents the mean of the multivariate normal distribution of the corresponding class. The class indices are in the ClassNames attribute. Data Types: single | double Sigma — Within-class covariance matrix of scalar values Within-class covariance, specified as a matrix of scalar values. • For a linear discriminant, Sigma is a symmetric, positive semidefinite matrix of size p-by-p, where p is the number of predictors. • For a quadratic discriminant, Sigma is an array of size p-by-p-by-K, where K is the number of classes. For each i, Sigma(:,:,i) is a symmetric, positive semidefinite matrix. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'ClassNames',{'setosa' 'versicolor' 'virginica'} specifies a discriminant analysis classifier that uses 'setosa', 'versicolor', and 'virginica' as the grouping variables. ClassNames — Class names numeric vector | categorical vector | logical vector | character array | string array | cell array of character vectors Class names as ordered in Mu, specified as the comma-separated pair consisting of 'ClassNames' and an array containing grouping variables. Use any data type for a grouping variable, including numeric vector, categorical vector, logical vector, character array, string array, or cell array of character vectors. The default is 1:K, where K is the number of classes (the number of rows of Mu). Example: 'ClassNames',{'setosa' 'versicolor' 'virginica'} Data Types: single | double | logical | char | string | cell Cost — Cost of misclassification square matrix | structure Cost of misclassification, specified as the comma-separated pair consisting of 'Cost' and a square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. 32-3302

makecdiscr

Alternatively, Cost can be a structure S having two fields: S.ClassNames containing the group names as a variable of the same type as y, and S.ClassificationCosts containing the cost matrix. The default is Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. Data Types: single | double | struct PredictorNames — Predictor variable names {'X1','X2',...} (default) | string array | cell array of character vectors Predictor variable names, specified as the comma-separated pair consisting of 'PredictorNames' and a string array or cell array of character vectors containing the names for the predictor variables, in the order in which they appear in X. Data Types: string | cell Prior — Prior probabilities 'uniform' (default) | vector of scalar values | structure Prior probabilities for each class, specified as the comma-separated pair consisting of 'Prior' and one of the following: • 'uniform', meaning all class prior probabilities are equal • A vector containing one scalar value for each class • A structure S with two fields: • S.ClassNames containing the class names as a variable of the same type as ClassNames • S.ClassProbs containing a vector of corresponding probabilities Data Types: char | string | single | double | struct ResponseName — Response variable name 'Y' (default) | character vector | string scalar Response variable name, specified as the comma-separated pair consisting of 'ResponseName' and a character vector or string scalar containing the name of the response variable y. Example: 'ResponseName','Response' Data Types: char | string

Output Arguments cobj — Discriminant analysis classifier discriminant analysis classifier object Discriminant analysis classifier, returned as a discriminant analysis classifier object of class CompactClassificationDiscriminant. You can use the predict method to predict classification labels for new data.

Tips • You can change the discriminant type using dot notation after constructing cobj: 32-3303

32

Functions

cobj.DiscrimType = 'discrimType'

where discrimType is one of 'linear', 'quadratic', 'diagLinear', 'diagQuadratic', 'pseudoLinear', or 'pseudoQuadratic'. You can change between linear types or between quadratic types, but cannot change between a linear and a quadratic type. • cobj is a linear classifier when Sigma is a matrix. cobj is a quadratic classifier when Sigma is a three-dimensional array.

See Also CompactClassificationDiscriminant | compact | fitcdiscr | predict Topics “Discriminant Analysis Classification” on page 20-2 Introduced in R2014a

32-3304

makedist

makedist Create probability distribution object

Syntax pd = makedist(distname) pd = makedist(distname,Name,Value) list = makedist makedist -reset

Description pd = makedist(distname) creates a probability distribution object for the distribution distname, using the default parameter values. pd = makedist(distname,Name,Value) creates a probability distribution object with one or more distribution parameter values specified by name-value pair arguments. list = makedist returns a cell array list containing a list of the probability distributions that makedist can create. makedist -reset resets the list of distributions by searching the path for files contained in a package named prob and implementing classes derived from ProbabilityDistribution. Use this syntax after you define a custom distribution function. For details, see “Define Custom Distributions Using the Distribution Fitter App” on page 5-78.

Examples Create a Normal Distribution Object Create a normal distribution object using the default parameter values. pd = makedist('Normal') pd = NormalDistribution Normal distribution mu = 0 sigma = 1

Compute the interquartile range of the distribution. r = iqr(pd) r = 1.3490

32-3305

32

Functions

Create a Gamma Distribution Object Create a gamma distribution object using the default parameter values. pd = makedist('Gamma') pd = GammaDistribution Gamma distribution a = 1 b = 1

Compute the mean of the gamma distribution. mean = mean(pd) mean = 1

Specify Parameters for a Normal Distribution Object Create a normal distribution object with parameter values mu = 75 and sigma = 10. pd = makedist('Normal','mu',75,'sigma',10) pd = NormalDistribution Normal distribution mu = 75 sigma = 10

Specify Parameters for a Gamma Distribution Object Create a gamma distribution object with the parameter value a = 3 and the default value b = 1. pd = makedist('Gamma','a',3) pd = GammaDistribution Gamma distribution a = 3 b = 1

32-3306

makedist

Input Arguments distname — Distribution name character vector | string scalar Distribution name, specified as one of the following character vectors or string scalars. The distribution specified by distname determines the type of the returned probability distribution object. Distribution Name

Description

Distribution Object

'Beta'

Beta distribution

BetaDistribution

'Binomial'

Binomial distribution

BinomialDistribution

'BirnbaumSaunders'

Birnbaum-Saunders distribution BirnbaumSaundersDistribu tion

'Burr'

Burr distribution

BurrDistribution

'Exponential'

Exponential distribution

ExponentialDistribution

'ExtremeValue'

Extreme Value distribution

ExtremeValueDistribution

'Gamma'

Gamma distribution

GammaDistribution

'GeneralizedExtremeValue' Generalized Extreme Value distribution

GeneralizedExtremeValueD istribution

'GeneralizedPareto'

Generalized Pareto distribution

GeneralizedParetoDistrib ution

'HalfNormal'

Half-normal distribution

HalfNormalDistribution

'InverseGaussian'

Inverse Gaussian distribution

InverseGaussianDistribut ion

'Logistic'

Logistic distribution

LogisticDistribution

'Loglogistic'

Loglogistic distribution

LoglogisticDistribution

'Lognormal'

Lognormal distribution

LognormalDistribution

'Multinomial'

Multinomial distribution

MultinomialDistribution

'Nakagami'

Nakagami distribution

NakagamiDistribution

'NegativeBinomial'

Negative Binomial distribution

NegativeBinomialDistribu tion

'Normal'

Normal distribution

NormalDistribution

'PiecewiseLinear'

Piecewise Linear distribution

PiecewiseLinearDistribut ion

'Poisson'

Poisson distribution

PoissonDistribution

'Rayleigh'

Rayleigh distribution

RayleighDistribution

'Rician'

Rician distribution

RicianDistribution

'Stable'

Stable distribution

StableDistribution

'tLocationScale'

t Location-Scale distribution

tLocationScaleDistributi on

'Triangular'

Triangular distribution

TriangularDistribution

32-3307

32

Functions

Distribution Name

Description

Distribution Object

'Uniform'

Uniform distribution

UniformDistribution

'Weibull'

Weibull distribution

WeibullDistribution

Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: makedist('Normal','mu',10) specifies a normal distribution with parameter mu equal to 10, and parameter sigma equal to the default value of 1. Beta Distribution

a — First shape parameter 1 (default) | positive scalar value Example: 'a',3 Data Types: single | double b — Second shape parameter 1 (default) | positive scalar value Example: 'b',5 Data Types: single | double Binomial Distribution

N — Number of trials 1 (default) | positive integer value Example: 'N',25 Data Types: single | double p — Probability of success 0.5 (default) | scalar value in the range [0,1] Example: 'p',0.25 Data Types: single | double Birnbaum-Saunders Distribution

beta — Scale parameter 1 (default) | positive scalar value Example: 'beta',2 Data Types: single | double gamma — Shape parameter 1 (default) | nonnegative scalar value Example: 'gamma',0 Data Types: single | double 32-3308

makedist

Burr Distribution

alpha — Scale parameter 1 (default) | positive scalar value Example: 'alpha',2 Data Types: single | double c — First shape parameter 1 (default) | positive scalar value Example: 'c',2 Data Types: single | double k — Second shape parameter 1 (default) | positive scalar value Example: 'k',5 Data Types: single | double Exponential Distribution

mu — Mean parameter 1 (default) | positive scalar value Example: 'mu',5 Data Types: single | double Extreme Value Distribution

mu — Location parameter 0 (default) | scalar value Example: 'mu',-2 Data Types: single | double sigma — Scale parameter 1 (default) | nonnegative scalar value Example: 'sigma',2 Data Types: single | double Gamma Distribution

a — Shape parameter 1 (default) | positive scalar value Example: 'a',2 Data Types: single | double b — Scale parameter 1 (default) | nonnegative scalar value Example: 'b',0 Data Types: single | double 32-3309

32

Functions

Generalized Extreme Value Distribution

k — Shape parameter 0 (default) | scalar value Example: 'k',0 Data Types: single | double sigma — Scale parameter 1 (default) | nonnegative scalar value Example: 'sigma',2 Data Types: single | double mu — Location parameter 0 (default) | scalar value Example: 'mu',1 Data Types: single | double Generalized Pareto Distribution

k — Shape parameter 1 (default) | scalar value Example: 'k',0 Data Types: single | double sigma — Scale parameter 1 (default) | nonnegative scalar value Example: 'sigma',2 Data Types: single | double theta — Location parameter 1 (default) | scalar value Example: 'theta',2 Data Types: single | double Half–Normal Distribution

mu — Location parameter 0 (default) | scalar value Example: 'mu',1 Data Types: single | double sigma — Shape parameter 1 (default) | nonnegative scalar value Example: 'sigma',2 Data Types: single | double 32-3310

makedist

Inverse Gaussian Distribution

mu — Scale parameter 1 (default) | positive scalar value Example: 'mu',2 Data Types: single | double lambda — Shape parameter 1 (default) | positive scalar value Example: 'lambda',4 Data Types: single | double Logistic Distribution

mu — Mean 0 (default) | scalar value Example: 'mu',2 Data Types: single | double sigma — Scale parameter 1 (default) | nonnegative scalar value Example: 'sigma',4 Data Types: single | double Loglogistic Distribution

mu — Mean of logarithmic values 0 (default) | scalar value Example: 'mu',2 Data Types: single | double sigma — Scale parameter of logarithmic values 1 (default) | nonnegative scalar value Example: 'sigma',4 Data Types: single | double Lognormal Distribution

mu — Mean of logarithmic values 0 (default) | scalar value Example: 'mu',2 Data Types: single | double sigma — Standard deviation of logarithmic values 1 (default) | nonnegative scalar value Example: 'sigma',2 Data Types: single | double 32-3311

32

Functions

Multinomial Distribution

probabilities — Outcome probabilities [0.500 0.500] (default) | vector of scalar values in the range [0,1] Outcome probabilities, specified as a vector of scalar values in the range [0,1]. The probabilities sum to 1 and correspond to outcomes [1, 2, ..., k], where k is the number of elements in the probabilities vector. Example: 'probabilities',[0.1 0.2 0.5 0.2] gives the probabilities that the outcome is 1, 2, 3, or 4, respectively. Data Types: single | double Nakagami Distribution

mu — Shape parameter 1 (default) | positive scalar value Example: 'mu',5 Data Types: single | double omega — Scale parameter 1 (default) | positive scalar value Example: 'omega',5 Data Types: single | double Negative Binomial Distribution

R — Number of successes 1 (default) | positive scalar value Example: 'R',5 Data Types: single | double p — Probability of success 0.5 (default) | scalar value in the range (0,1] Example: 'p',0.1 Data Types: single | double Normal Distribution

mu — Mean 0 (default) | scalar value Example: 'mu',2 Data Types: single | double sigma — Standard deviation 1 (default) | nonnegative scalar value Example: 'sigma',2 Data Types: single | double 32-3312

makedist

Piecewise Linear Distribution

x — Data values 1 (default) | monotonically increasing vector of scalar values Example: 'x',[1 2 3] Data Types: single | double Fx — cdf values 1 (default) | monotonically increasing vector of scalar values that start at 0 and end at 1 Example: 'Fx',[0.2 0.5 1] Data Types: single | double Poisson Distribution

lambda — Mean 1 (default) | nonnegative scalar value Example: 'lambda',5 Data Types: single | double Rayleigh Distribution

b — Defining parameter 1 (default) | positive scalar value Example: 'b',3 Data Types: single | double Rician Distribution

s — Noncentrality parameter 1 (default) | nonnegative scalar value Example: 's',0 Data Types: single | double sigma — Scale parameter 1 (default) | positive scalar value Example: 'sigma',2 Data Types: single | double Stable Distribution

alpha — First shape parameter 2 (default) | scalar value in the range (0,2] Example: 'alpha',1 Data Types: single | double beta — Second shape parameter 0 (default) | scalar value in the range [-1,1] Example: 'beta',0.5 32-3313

32

Functions

Data Types: single | double gam — Scale parameter 1 (default) | scalar value in the range (0,∞) Example: 'gam',2 Data Types: single | double delta — Location parameter 0 (default) | scalar value Example: 'delta',5 Data Types: single | double t Location-Scale Distribution

mu — Location parameter 0 (default) | scalar value Example: 'mu',-2 Data Types: single | double sigma — Scale parameter 1 (default) | positive scalar value Example: 'sigma',2 Data Types: single | double nu — Degrees of freedom 5 (default) | positive scalar value Example: 'nu',20 Data Types: single | double Triangular Distribution

a — Lower limit 0 (default) | scalar value Example: 'a',-2 Data Types: single | double b — Peak location 0.5 (default) | scalar value greater than or equal to a Example: 'b',1 Data Types: single | double c — Upper limit 1 (default) | scalar value greater than or equal to b Example: 'c',5 Data Types: single | double 32-3314

makedist

Uniform Distribution

lower — Lower parameter 0 (default) | scalar value Example: 'lower',-4 Data Types: single | double upper — Upper parameter 1 (default) | scalar value greater than lower Example: 'upper',2 Data Types: single | double Weibull Distribution

a — Scale parameter 1 (default) | positive scalar value Example: 'a',2 Data Types: single | double b — Shape parameter 1 (default) | positive scalar value Example: 'b',5 Data Types: single | double

Output Arguments pd — Probability distribution probability distribution object Probability distribution, returned as a probability distribution object of the type specified by distname. list — List of probability distributions cell array of character vectors List of probability distributions that makedist can create, returned as a cell array of character vectors.

Alternative Functionality App The Distribution Fitter app opens a graphical user interface for you to import data from the workspace and interactively fit a probability distribution to that data. You can then save the distribution to the workspace as a probability distribution object. Open the Distribution Fitter app using distributionFitter, or click Distribution Fitter on the Apps tab.

See Also distributionFitter | fitdist 32-3315

32

Functions

Topics “Working with Probability Distributions” on page 5-2 “Supported Distributions” on page 5-13 “Define Custom Distributions Using the Distribution Fitter App” on page 5-78 Introduced in R2013a

32-3316

manova

manova Class: RepeatedMeasuresModel Multivariate analysis of variance

Syntax manovatbl = manova(rm) manovatbl = manova(rm,Name,Value) [manovatbl,A,C,D] = manova( ___ )

Description manovatbl = manova(rm) returns the results of multivariate analysis of variance (manova) for the repeated measures model rm. manovatbl = manova(rm,Name,Value) also returns manova results with additional options, specified by one or more Name,Value pair arguments. [manovatbl,A,C,D] = manova( ___ ) also returns arrays A, C, and D for the hypotheses tests of the form A*B*C = D, where D is zero.

Input Arguments rm — Repeated measures model RepeatedMeasuresModel object Repeated measures model, returned as a RepeatedMeasuresModel object. For properties and methods of this object, see RepeatedMeasuresModel. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. WithinModel — Model specifying within-subjects hypothesis test 'separatemeans' (default) | model specification using formula | r-by-nc matrix Model specifying the within-subjects hypothesis test, specified as one of the following: • 'separatemeans' — Compute a separate mean for each group, and test for equality among the means. • Model specification — This is a model specification in the within-subject factors. Test each term in the model. In this case, tbl contains a separate manova for each term in the formula, with the multivariate response equal to the vector of coefficients of that term. • An r-by-nc matrix, C, specifying nc contrasts among the r repeated measures. If Y represents the matrix of repeated measures you use in the repeated measures model rm, then the output tbl contains a separate manova for each column of Y*C. 32-3317

32

Functions

Example: 'WithinModel','separatemeans' Data Types: single | double | char | string By — Single between-subjects factor character vector | string scalar Single between-subjects factor, specified as the comma-separated pair consisting of 'By' and a character vector or string scalar. manova performs a separate test of the within-subjects model for each value of this factor. For example, if you have a between-subjects factor, Drug, then you can specify that factor to perform manova as follows. Example: 'By','Drug' Data Types: char | string

Output Arguments manovatbl — Results of multivariate analysis of variance table Results of multivariate analysis of variance for the repeated measures model rm, returned as a table. manova uses these methods to measure the contributions of the model terms to the overall covariance: • Wilks’ Lambda • Pillai’s trace • Hotelling-Lawley trace • Roy’s maximum root statistic For details, see “Multivariate Analysis of Variance for Repeated Measures” on page 9-59. manova returns the results for these tests for each group. manovatbl contains the following columns. Column Name

Definition

Within

Within-subject terms

Between

Between-subject terms

Statistic

Name of the statistic computed

Value

Value of the corresponding statistic

F

F-statistic value

RSquare

Measure for variance explained

df1

Numerator degrees of freedom

df2

Denominator degrees of freedom

pValue

p-value for the corresponding F-statistic value

Data Types: table 32-3318

manova

A — Specification based on between-subjects model matrix | cell array Specification based on the between-subjects model, returned as a matrix or a cell array. It permits the hypothesis on the elements within given columns of B (within time hypothesis). If manovatbl contains multiple hypothesis tests, A might be a cell array. Data Types: single | double | cell C — Specification based on within-subjects model matrix | cell array Specification based on the within-subjects model, returned as a matrix or a cell array. It permits the hypotheses on the elements within given rows of B (between time hypotheses). If manovatbl contains multiple hypothesis tests, C might be a cell array. Data Types: single | double | cell D — Hypothesis value 0 Hypothesis value, returned as 0.

Examples Perform Multivariate Analysis of Variance Load the sample data. load fisheriris

The column vector species consists of iris flowers of three different species: setosa, versicolor, virginica. The double matrix meas consists of four types of measurements on the flowers: the length and width of sepals and petals in centimeters, respectively. Store the data in a table array. t = table(species,meas(:,1),meas(:,2),meas(:,3),meas(:,4),... 'VariableNames',{'species','meas1','meas2','meas3','meas4'}); Meas = table([1 2 3 4]','VariableNames',{'Measurements'});

Fit a repeated measures model where the measurements are the responses and the species is the predictor variable. rm = fitrm(t,'meas1-meas4~species','WithinDesign',Meas);

Perform multivariate analysis of variance. manova(rm) ans=8×9 table Within ________ Constant Constant

Between ___________

Statistic _________

Value _________

F ______

RSquare _______

df1 ___

df2 ___

pVa _____

(Intercept) (Intercept)

Pillai Wilks

0.99013 0.0098724

4847.5 4847.5

0.99013 0.99013

3 3

145 145

3.788 3.788

32-3319

32

Functions

Constant Constant Constant Constant Constant Constant

(Intercept) (Intercept) species species species species

Hotelling Roy Pillai Wilks Hotelling Roy

100.29 100.29 0.96909 0.041153 23.051 23.04

4847.5 4847.5 45.749 189.92 555.17 1121.3

0.99013 0.99013 0.48455 0.79714 0.92016 0.9584

3 3 6 6 6 3

145 145 292 290 288 146

3.788 3.788 2.47 2.39 4.666 1.477

Perform multivariate anova separately for each species. manova(rm,'By','species') ans=12×9 table Within ________ Constant Constant Constant Constant Constant Constant Constant Constant Constant Constant Constant Constant

Between __________________

Statistic _________

Value ________

F ______

RSquare _______

df1 ___

df2 ___

species=setosa species=setosa species=setosa species=setosa species=versicolor species=versicolor species=versicolor species=versicolor species=virginica species=virginica species=virginica species=virginica

Pillai Wilks Hotelling Roy Pillai Wilks Hotelling Roy Pillai Wilks Hotelling Roy

0.9823 0.017698 55.504 55.504 0.97 0.029999 32.334 32.334 0.97261 0.027394 35.505 35.505

2682.7 2682.7 2682.7 2682.7 1562.8 1562.8 1562.8 1562.8 1716.1 1716.1 1716.1 1716.1

0.9823 0.9823 0.9823 0.9823 0.97 0.97 0.97 0.97 0.97261 0.97261 0.97261 0.97261

3 3 3 3 3 3 3 3 3 3 3 3

145 145 145 145 145 145 145 145 145 145 145 145

Return Arrays of the Hypothesis Test Load the sample data. load fisheriris

The column vector species consists of iris flowers of three different species: setosa, versicolor, virginica. The double matrix meas consists of four types of measurements on the flowers: the length and width of sepals and petals in centimeters, respectively. Store the data in a table array. t = table(species,meas(:,1),meas(:,2),meas(:,3),meas(:,4),... 'VariableNames',{'species','meas1','meas2','meas3','meas4'}); Meas = dataset([1 2 3 4]','VarNames',{'Measurements'});

Fit a repeated measures model where the measurements are the responses and the species is the predictor variable. rm = fitrm(t,'meas1-meas4~species','WithinDesign',Meas);

Perform multivariate analysis of variance. Also return the arrays for constructing the hypothesis test. [manovatbl,A,C,D] = manova(rm) manovatbl=8×9 table Within Between

32-3320

Statistic

Value

F

RSquare

df1

df2

pVa

manova

________

___________

_________

_________

______

_______

___

___

_____

Constant Constant Constant Constant Constant Constant Constant Constant

(Intercept) (Intercept) (Intercept) (Intercept) species species species species

Pillai Wilks Hotelling Roy Pillai Wilks Hotelling Roy

0.99013 0.0098724 100.29 100.29 0.96909 0.041153 23.051 23.04

4847.5 4847.5 4847.5 4847.5 45.749 189.92 555.17 1121.3

0.99013 0.99013 0.99013 0.99013 0.48455 0.79714 0.92016 0.9584

3 3 3 3 6 6 6 3

145 145 145 145 292 290 288 146

3.788 3.788 3.788 3.788 2.47 2.39 4.666 1.477

A=2×1 cell array {1x3 double} {2x3 double} C = 4×3 1 -1 0 0

0 1 -1 0

0 0 1 -1

D = 0

Index into matrix A. A{1} ans = 1×3 1

0

0

1 0

0 1

A{2} ans = 2×3 0 0

Tips • The multivariate response for each observation (subject) is the vector of repeated measures. • To test a more general hypothesis A*B*C = D, use coeftest.

See Also anova | coeftest | fitrm | ranova Topics “Model Specification for Repeated Measures Models” on page 9-54 “Multivariate Analysis of Variance for Repeated Measures” on page 9-59 32-3321

32

Functions

manova1 One-way multivariate analysis of variance

Syntax d = manova1(X,group) d = manova1(X,group,alpha) [d,p] = manova1(...) [d,p,stats] = manova1(...)

Description d = manova1(X,group) performs a one-way Multivariate Analysis of Variance (MANOVA) for comparing the multivariate means of the columns of X, grouped by group. X is an m-by-n matrix of data values, and each row is a vector of measurements on n variables for a single observation. group is a grouping variable defined as a categorical variable, vector, character array, string array, or cell array of character vectors. Two observations are in the same group if they have the same value in the group array. The observations in each group represent a sample from a population. The function returns d, an estimate of the dimension of the space containing the group means. manova1 tests the null hypothesis that the means of each group are the same n-dimensional multivariate vector, and that any difference observed in the sample X is due to random chance. If d = 0, there is no evidence to reject that hypothesis. If d = 1, then you can reject the null hypothesis at the 5% level, but you cannot reject the hypothesis that the multivariate means lie on the same line. Similarly, if d = 2 the multivariate means may lie on the same plane in n-dimensional space, but not on the same line. d = manova1(X,group,alpha) gives control of the significance level, alpha. The return value d will be the smallest dimension having p > alpha, where p is a p-value for testing whether the means lie in a space of that dimension. [d,p] = manova1(...) also returns a p, a vector of p-values for testing whether the means lie in a space of dimension 0, 1, and so on. The largest possible dimension is either the dimension of the space, or one less than the number of groups. There is one element of p for each dimension up to, but not including, the largest. If the ith p-value is near zero, this casts doubt on the hypothesis that the group means lie on a space of i-1 dimensions. The choice of a critical p-value to determine whether the result is judged statistically significant is left to the researcher and is specified by the value of the input argument alpha. It is common to declare a result significant if the p-value is less than 0.05 or 0.01. [d,p,stats] = manova1(...) also returns stats, a structure containing additional MANOVA results. The structure contains the following fields.

32-3322

Field

Contents

W

Within-groups sum of squares and cross-products matrix

B

Between-groups sum of squares and cross-products matrix

T

Total sum of squares and cross-products matrix

manova1

Field

Contents

dfW

Degrees of freedom for W

dfB

Degrees of freedom for B

dfT

Degrees of freedom for T

lambda

Vector of values of Wilk's lambda test statistic for testing whether the means have dimension 0, 1, etc.

chisq

Transformation of lambda to an approximate chi-square distribution

chisqdf

Degrees of freedom for chisq

eigenval

Eigenvalues of W-1B

eigenvec

Eigenvectors of W-1B; these are the coefficients for the canonical variables C, and they are scaled so the within-group variance of the canonical variables is 1

canon

Canonical variables C, equal to XC*eigenvec, where XC is X with columns centered by subtracting their means

mdist

A vector of Mahalanobis distances from each point to the mean of its group

gmdist

A matrix of Mahalanobis distances between each pair of group means

The canonical variables C are linear combinations of the original variables, chosen to maximize the separation between groups. Specifically, C(:,1) is the linear combination of the X columns that has the maximum separation between groups. This means that among all possible linear combinations, it is the one with the most significant F statistic in a one-way analysis of variance. C(:,2) has the maximum separation subject to it being orthogonal to C(:,1), and so on. You may find it useful to use the outputs from manova1 along with other functions to supplement your analysis. For example, you may want to start with a grouped scatter plot matrix of the original variables using gplotmatrix. You can use gscatter to visualize the group separation using the first two canonical variables. You can use manovacluster to graph a dendrogram showing the clusters among the group means. Assumptions The MANOVA test makes the following assumptions about the data in X: • The populations for each group are normally distributed. • The variance-covariance matrix is the same for each population. • All observations are mutually independent.

Examples you can use manova1 to determine whether there are differences in the averages of four car characteristics, among groups defined by the country where the cars were made. load carbig [d,p] = manova1([MPG Acceleration Weight Displacement],... Origin) d = 3 p = 0 0.0000

32-3323

32

Functions

0.0075 0.1934

There are four dimensions in the input matrix, so the group means must lie in a four-dimensional space. manova1 shows that you cannot reject the hypothesis that the means lie in a 3-D subspace.

References [1] Krzanowski, W. J. Principles of Multivariate Analysis: A User's Perspective. New York: Oxford University Press, 1988.

See Also anova1 | canoncorr | gplotmatrix | gscatter | manovacluster Topics “Grouping Variables” on page 2-45 Introduced before R2006a

32-3324

manovacluster

manovacluster Dendrogram of group mean clusters following MANOVA

Syntax manovacluster(stats) manovacluster(stats,method) H = manovacluster(stats,method)

Description manovacluster(stats) generates a dendrogram plot of the group means after a multivariate analysis of variance (MANOVA). stats is the output stats structure from manova1. The clusters are computed by applying the single linkage method to the matrix of Mahalanobis distances between group means. See dendrogram for more information on the graphical output from this function. The dendrogram is most useful when the number of groups is large. manovacluster(stats,method) uses the specified method in place of single linkage. method can be one of these values, which identify the methods used to create the cluster hierarchy. (See linkage for additional information.) Method

Description

'single'

Shortest distance (default)

'complete'

Largest distance

'average'

Average distance

'centroid'

Centroid distance

'ward'

Incremental sum of squares

H = manovacluster(stats,method) returns a vector of handles to the lines in the figure.

Examples Dendrogram of Group Means After MANOVA Load the sample data. load carbig

Define the variable matrix. X = [MPG Acceleration Weight Displacement];

Perform one-way MANOVA to compare the means of MPG, Acceleration, Weight,and Displacement grouped by Origin. [d,p,stats] = manova1(X,Origin);

32-3325

32

Functions

Create a dendrogram plot of the group means. manovacluster(stats)

See Also cluster | dendrogram | linkage | manova1 Introduced before R2006a

32-3326

margin

margin Margin of k-nearest neighbor classifier

Syntax m = margin(mdl,tbl,ResponseVarName) m = margin(mdl,tbl,Y) m = margin(mdl,X,Y)

Description m = margin(mdl,tbl,ResponseVarName) returns the classification margins on page 32-3329 for mdl with data tbl and classification tbl.ResponseVarName. If tbl contains the response variable used to train mdl, then you do not need to specify ResponseVarName. m is returned as a numeric vector of length size(tbl,1). Each entry in m represents the margin for the corresponding row of tbl and the corresponding true class label in tbl.ResponseVarName, computed using mdl. m = margin(mdl,tbl,Y) returns the classification margins for mdl with data tbl and classification Y. m = margin(mdl,X,Y) returns the classification margins for mdl with data X and classification Y. m is returned as a numeric vector of length size(X,1).

Examples Margin Calculation Create a k-nearest neighbor classifier for the Fisher iris data, where k = 5. Load the Fisher iris data set. load fisheriris

Create a classifier for five nearest neighbors. mdl = fitcknn(meas,species,'NumNeighbors',5);

Examine the margin of the classifier for a mean observation classified as 'versicolor'. X = mean(meas); Y = {'versicolor'}; m = margin(mdl,X,Y) m = 1

All five nearest neighbors classify as 'versicolor'.

32-3327

32

Functions

Input Arguments mdl — k-nearest neighbor classifier model ClassificationKNN object k-nearest neighbor classifier model, specified as a ClassificationKNN object. tbl — Sample data table Sample data used to train the model, specified as a table. Each row of tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, tbl can contain one additional column for the response variable. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. If tbl contains the response variable used to train mdl, then you do not need to specify ResponseVarName or Y. If you train mdl using sample data contained in a table, then the input data for margin must also be in a table. Data Types: table ResponseVarName — Response variable name name of a variable in tbl Response variable name, specified as the name of a variable in tbl. If tbl contains the response variable used to train mdl, then you do not need to specify ResponseVarName. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable is stored as tbl.response, then specify it as 'response'. Otherwise, the software treats all columns of tbl, including tbl.response, as predictors. The response variable must be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array. Data Types: char | string X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X represents one observation, and each column represents one variable. Data Types: single | double Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels, specified as a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. Each row of Y represents the classification of the corresponding row of X. Data Types: categorical | char | string | logical | single | double | cell 32-3328

margin

More About Margin The classification margin for each observation is the difference between the classification score for the true class and the maximal classification score for the false classes. Score The score of a classification is the posterior probability of the classification. The posterior probability is the number of neighbors with that classification divided by the number of neighbors. For a more detailed definition that includes weights and prior probabilities, see “Posterior Probability” on page 32-4144.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB).

See Also ClassificationKNN | edge | fitcknn | loss Topics “Classification Using Nearest Neighbors” on page 18-11 Introduced in R2012a

32-3329

32

Functions

margin Class: ClassificationLinear Classification margins for linear classification models

Syntax m = margin(Mdl,X,Y) m = margin( ___ ,Name,Value)

Description m = margin(Mdl,X,Y) returns the classification margins on page 32-3337 for the binary, linear classification model Mdl using predictor data in X and corresponding class labels in Y. m contains classification margins for each regularization strength in Mdl. m = margin( ___ ,Name,Value) uses any of the previous syntaxes and additional options specified by one or more Name,Value pair arguments. For example, you can specify that columns in the predictor data correspond to observations.

Input Arguments Mdl — Binary, linear classification model ClassificationLinear model object Binary, linear classification model, specified as a ClassificationLinear model object. You can create a ClassificationLinear model object using fitclinear. X — Predictor data full matrix | sparse matrix Predictor data, specified as an n-by-p full or sparse matrix. This orientation of X indicates that rows correspond to individual observations, and columns correspond to individual predictor variables. Note If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in computation time. The length of Y and the number of observations in X must be equal. Data Types: single | double Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels, specified as a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. 32-3330

margin

• The data type of Y must be the same as the data type of Mdl.ClassNames. (The software treats string arrays as cell arrays of character vectors.) • The distinct classes in Y must be a subset of Mdl.ClassNames. • If Y is a character array, then each element must correspond to one row of the array. • The length of Y and the number of observations in X must be equal. Data Types: categorical | char | string | logical | single | double | cell Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. ObservationsIn — Predictor data observation dimension 'rows' (default) | 'columns' Predictor data observation dimension, specified as the comma-separated pair consisting of 'ObservationsIn' and 'columns' or 'rows'. Note If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', then you might experience a significant reduction in optimizationexecution time.

Output Arguments m — Classification margins numeric column vector | numeric matrix Classification margins on page 32-3337, returned as a numeric column vector or matrix. m is n-by-L, where n is the number of observations in X and L is the number of regularization strengths in Mdl (that is, numel(Mdl.Lambda)). m(i,j) is the classification margin of observation i using the trained linear classification model that has regularization strength Mdl.Lambda(j).

Examples Estimate Test-Sample Margins Load the NLP data set. load nlpdata

X is a sparse matrix of predictor data, and Y is a categorical vector of class labels. There are more than two classes in the data. The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages. 32-3331

32

Functions

Ystats = Y == 'stats';

Train a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation. Specify to hold out 30% of the observations. Optimize the objective function using SpaRSA. rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'Solver','sparsa','Holdout',0.30); CMdl = CVMdl.Trained{1};

CVMdl is a ClassificationPartitionedLinear model. It contains the property Trained, which is a 1-by-1 cell array holding a ClassificationLinear model that the software trained using the training set. Extract the training and test data from the partition definition. trainIdx = training(CVMdl.Partition); testIdx = test(CVMdl.Partition);

Estimate the training- and test-sample margins. mTrain = margin(CMdl,X(trainIdx,:),Ystats(trainIdx)); mTest = margin(CMdl,X(testIdx,:),Ystats(testIdx));

Because there is one regularization strength in CMdl, mTrain and mTest are column vectors with lengths equal to the number of training and test observations, respectively. Plot both sets of margins using box plots. figure; boxplot([mTrain; mTest],[zeros(size(mTrain,1),1); ones(size(mTest,1),1)], ... 'Labels',{'Training set','Test set'}); h = gca; h.YLim = [-5 60]; title 'Training- and Test-Set Margins'

32-3332

margin

The distributions of the margins between the training and test sets appear similar.

Feature Selection Using Test-Sample Margins One way to perform feature selection is to compare test-sample margins from multiple models. Based solely on this criterion, the classifier with the larger margins is the better classifier. Load the NLP data set. Preprocess the data as in “Estimate Test-Sample Margins” on page 32-3331. load nlpdata Ystats = Y == 'stats'; X = X'; rng(1); % For reproducibility

Create a data partition which holds out 30% of the observations for testing. Partition = cvpartition(Ystats,'Holdout',0.30); testIdx = test(Partition); % Test-set indices XTest = X(:,testIdx); YTest = Ystats(testIdx);

Partition is a cvpartition object that defines the data set partition. Randomly choose 10% of the predictor variables. 32-3333

32

Functions

p = size(X,1); % Number of predictors idxPart = randsample(p,ceil(0.1*p));

Train two binary, linear classification models: one that uses the all of the predictors and one that uses the random 10%. Optimize the objective function using SpaRSA, and indicate that observations correspond to columns. CVMdl = fitclinear(X,Ystats,'CVPartition',Partition,'Solver','sparsa',... 'ObservationsIn','columns'); PCVMdl = fitclinear(X(idxPart,:),Ystats,'CVPartition',Partition,'Solver','sparsa',... 'ObservationsIn','columns');

CVMdl and PCVMdl are ClassificationPartitionedLinear models. Extract the trained ClassificationLinear models from the cross-validated models. CMdl = CVMdl.Trained{1}; PCMdl = PCVMdl.Trained{1};

Estimate the test sample margins for each classifier. Plot the distribution of the margins sets using box plots. fullMargins = margin(CMdl,XTest,YTest,'ObservationsIn','columns'); partMargins = margin(PCMdl,XTest(idxPart,:),YTest,... 'ObservationsIn','columns'); figure; boxplot([fullMargins partMargins],'Labels',... {'All Predictors','10% of the Predictors'}); h = gca; h.YLim = [-20 60]; title('Test-Sample Margins')

32-3334

margin

The margin distribution of CMdl is situated higher than the margin distribution of PCMdl.

Find Good Lasso Penalty Using Margins To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare distributions of test-sample margins. Load the NLP data set. Preprocess the data as in “Estimate Test-Sample Margins” on page 32-3331. load nlpdata Ystats = Y == 'stats'; X = X'; Partition = cvpartition(Ystats,'Holdout',0.30); testIdx = test(Partition); XTest = X(:,testIdx); YTest = Ystats(testIdx); −8

Create a set of 11 logarithmically-spaced regularization strengths from 10

1

through 10 .

Lambda = logspace(-8,1,11);

Train binary, linear classification models that use each of the regularization strengths. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8. 32-3335

32

Functions

rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'CVPartition',Partition,'Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8) CVMdl = classreg.learning.partition.ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 1 Partition: [1x1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods

Extract the trained linear classification model. Mdl = CVMdl.Trained{1} Mdl = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'logit' Beta: [34023x11 double] Bias: [1x11 double] Lambda: [1x11 double] Learner: 'logistic' Properties, Methods

Mdl is a ClassificationLinear model object. Because Lambda is a sequence of regularization strengths, you can think of Mdl as 11 models, one for each regularization strength in Lambda. Estimate the test-sample margins. m = margin(Mdl,X(:,testIdx),Ystats(testIdx),'ObservationsIn','columns'); size(m) ans = 1×2 9471

11

Because there are 11 regularization strengths, m has 11 columns. Plot the test-sample margins for each regularization strength. Because logistic regression scores are in [0,1], margins are in [-1,1]. Rescale the margins to help identify the regularization strength that maximizes the margins over the grid. figure; boxplot(10000.^m)

32-3336

margin

ylabel('Exponentiated test-sample margins') xlabel('Lambda indices')

1

Several values of Lambda yield margin distributions that are compacted near 10000 . Higher values of lambda lead to predictor variable sparsity, which is a good quality of a classifier. Choose the regularization strength that occurs just before the centers of the margin distributions start decreasing. LambdaFinal = Lambda(5);

Train a linear classification model using the entire data set and specify the desired regularization strength. MdlFinal = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',LambdaFinal);

To estimate labels for new observations, pass MdlFinal and the new data to predict.

More About Classification Margin The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class. 32-3337

32

Functions

The software defines the classification margin for binary classification as m = 2yf x . x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x). If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Classification Score For linear classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by f j(x) = xβ j + b j . For the model with regularization strength j, β j is the estimated column vector of coefficients (the model property Beta(:,j)) and b j is the estimated, scalar bias (the model property Bias(j)). The raw classification score for classifying x into the negative class is –f(x). The software classifies observations into the class that yields the positive score. If the linear classification model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores (see ScoreTransform).

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB).

See Also ClassificationLinear | edge | fitclinear | predict Introduced in R2016a

32-3338

margin

margin Classification margins

Syntax m = margin(obj,X,Y)

Description m = margin(obj,X,Y) returns the classification margins for the matrix of predictors X and class labels Y. For the definition, see “More About” on page 32-3340.

Input Arguments obj Discriminant analysis classifier of class ClassificationDiscriminant or CompactClassificationDiscriminant, typically constructed with fitcdiscr. X Matrix where each row represents an observation, and each column represents a predictor. The number of columns in X must equal the number of predictors in obj. Y Class labels, with the same data type as exists in obj. The number of elements of Y must equal the number of rows of X.

Output Arguments m Numeric column vector of length size(X,1). Each entry in m represents the margin for the corresponding rows of X and (true class) Y, computed using obj.

Examples Compute the classification margin for the Fisher iris data, trained on its first two columns of data, and view the last 10 entries: load fisheriris X = meas(:,1:2); obj = fitcdiscr(X,species); M = margin(obj,X,species); M(end-10:end) ans = 0.6551 0.4838

32-3339

32

Functions

0.6551 -0.5127 0.5659 0.4611 0.4949 0.1024 0.2787 -0.1439 -0.4444

The classifier trained on all the data is better: obj = fitcdiscr(meas,species); M = margin(obj,meas,species); M(end-10:end) ans = 0.9983 1.0000 0.9991 0.9978 1.0000 1.0000 0.9999 0.9882 0.9937 1.0000 0.9649

More About Margin The classification margin is the difference between the classification score for the true class and maximal classification score for the false classes. The classification margin is a column vector with the same number of rows as in the matrix X. A high value of margin indicates a more reliable prediction than a low value. Score (discriminant analysis) For discriminant analysis, the score of a classification is the posterior probability of the classification. For the definition of posterior probability in discriminant analysis, see “Posterior Probability” on page 20-6.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB).

See Also ClassificationDiscriminant | edge | fitcdiscr | loss | predict 32-3340

margin

Topics “Discriminant Analysis Classification” on page 20-2

32-3341

32

Functions

margin Package: Classification margins for multiclass error-correcting output codes (ECOC) model

Syntax m = margin(Mdl,tbl,ResponseVarName) m = margin(Mdl,tbl,Y) m = margin(Mdl,X,Y) m = margin( ___ ,Name,Value)

Description m = margin(Mdl,tbl,ResponseVarName) returns the classification margins on page 32-3350 (m) for the trained multiclass error-correcting output codes (ECOC) model Mdl using the predictor data in table tbl and the class labels in tbl.ResponseVarName. m = margin(Mdl,tbl,Y) returns the classification margins for the classifier Mdl using the predictor data in table tbl and the class labels in vector Y. m = margin(Mdl,X,Y) returns the classification margins for the classifier Mdl using the predictor data in matrix X and the class labels Y. m = margin( ___ ,Name,Value) specifies options using one or more name-value pair arguments in addition to any of the input argument combinations in previous syntaxes. For example, you can specify a decoding scheme, binary learner loss function, and verbosity level.

Examples Test-Sample Classification Margins of ECOC Model Calculate the test-sample classification margins of an ECOC model with SVM binary learners. Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); % Class order rng(1) % For reproducibility

Train an ECOC model using SVM binary classifiers. Specify a 30% holdout sample, standardize the predictors using an SVM template, and specify the class order. 32-3342

margin

t = templateSVM('Standardize',true); PMdl = fitcecoc(X,Y,'Holdout',0.30,'Learners',t,'ClassNames',classOrder); Mdl = PMdl.Trained{1}; % Extract trained, compact classifier

PMdl is a ClassificationPartitionedECOC model. It has the property Trained, a 1-by-1 cell array containing the CompactClassificationECOC model that the software trained using the training set. Calculate the test-sample classification margins. Display the distribution of the margins using a boxplot. testInds = test(PMdl.Partition); XTest = X(testInds,:); YTest = Y(testInds,:); m = margin(Mdl,XTest,YTest);

% Extract the test indices

boxplot(m) title('Test-Sample Margins')

The classification margin of an observation is the positive-class negated loss minus the maximum negative-class negated loss. Choose classifiers that yield relatively large margins.

32-3343

32

Functions

Select ECOC Model Features by Examining Test-Sample Margins Perform feature selection by comparing test-sample margins from multiple models. Based solely on this comparison, the model with the greatest margins is the best model. Load Fisher's iris data set. Specify the predictor data X, the response data Y, and the order of the classes in Y. load fisheriris X = meas; Y = categorical(species); classOrder = unique(Y); % Class order rng(1); % For reproducibility

Partition the data set into training and test sets. Specify a 30% holdout sample for testing. Partition = cvpartition(Y,'Holdout',0.30); testInds = test(Partition); % Indices for the test set XTest = X(testInds,:); YTest = Y(testInds,:);

Partition defines the data set partition. Define these two data sets: • fullX contains all four predictors. • partX contains the sepal measurements only. fullX = X; partX = X(:,1:2);

Train an ECOC model using SVM binary classifiers for each predictor set. Specify the partition definition, standardize the predictors using an SVM template, and define the class order. t = templateSVM('Standardize',true); fullPMdl = fitcecoc(fullX,Y,'CVPartition',Partition,'Learners',t,... 'ClassNames',classOrder); partPMdl = fitcecoc(partX,Y,'CVPartition',Partition,'Learners',t,... 'ClassNames',classOrder); fullMdl = fullPMdl.Trained{1}; partMdl = partPMdl.Trained{1};

fullPMdl and partPMdl are ClassificationPartitionedECOC models. Each model has the property Trained, a 1-by-1 cell array containing the CompactClassificationECOC model that the software trained using the corresponding training set. Calculate the test-sample margins for each classifier. For each model, display the distribution of the margins using a boxplot. fullMargins = margin(fullMdl,XTest,YTest); partMargins = margin(partMdl,XTest(:,1:2),YTest); boxplot([fullMargins partMargins],'Labels',{'All Predictors','Two Predictors'}) title('Boxplots of Test-Sample Margins')

32-3344

margin

The margin distribution of fullMdl is situated higher and has less variability than the margin distribution of partMdl.

Input Arguments Mdl — Full or compact multiclass ECOC model ClassificationECOC model object | CompactClassificationECOC model object Full or compact multiclass ECOC model, specified as a ClassificationECOC or CompactClassificationECOC model object. To create a full or compact ECOC model, see ClassificationECOC or CompactClassificationECOC. tbl — Sample data table Sample data, specified as a table. Each row of tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, tbl can contain additional columns for the response variable and observation weights. tbl must contain all the predictors used to train Mdl. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. If you train Mdl using sample data contained in a table, then the input data for margin must also be in a table. 32-3345

32

Functions

Note If Mdl.BinaryLearners contains linear or kernel classification models (ClassificationLinear or ClassificationKernel model objects), then you cannot specify sample data in a table. Instead, pass a matrix (X) and class labels (Y). When training Mdl, assume that you set 'Standardize',true for a template object specified in the 'Learners' name-value pair argument of fitcecoc. In this case, for the corresponding binary learner j, the software standardizes the columns of the new predictor data using the corresponding means in Mdl.BinaryLearner{j}.Mu and standard deviations in Mdl.BinaryLearner{j}.Sigma. Data Types: table ResponseVarName — Response variable name name of variable in tbl Response variable name, specified as the name of a variable in tbl. If tbl contains the response variable used to train Mdl, then you do not need to specify ResponseVarName. If you specify ResponseVarName, then you must do so as a character vector or string scalar. For example, if the response variable is stored as tbl.y, then specify ResponseVarName as 'y'. Otherwise, the software treats all columns of tbl, including tbl.y, as predictors. The response variable must be a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array. Data Types: char | string X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one variable. The variables in the columns of X must be the same as the variables that trained the classifier Mdl. The number of rows in X must equal the number of rows in Y. When training Mdl, assume that you set 'Standardize',true for a template object specified in the 'Learners' name-value pair argument of fitcecoc. In this case, for the corresponding binary learner j, the software standardizes the columns of the new predictor data using the corresponding means in Mdl.BinaryLearner{j}.Mu and standard deviations in Mdl.BinaryLearner{j}.Sigma. Data Types: double | single Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels, specified as a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. Y must have the same data type as Mdl.ClassNames. (The software treats string arrays as cell arrays of character vectors.) The number of rows in Y must equal the number of rows in tbl or X. 32-3346

margin

Data Types: categorical | char | string | logical | single | double | cell Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: margin(Mdl,tbl,'y','BinaryLoss','exponential') specifies an exponential binary learner loss function. BinaryLoss — Binary learner loss function 'hamming' | 'linear' | 'logit' | 'exponential' | 'binodeviance' | 'hinge' | 'quadratic' | function handle Binary learner loss function, specified as the comma-separated pair consisting of 'BinaryLoss' and a built-in loss function name or function handle. • This table describes the built-in functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj) is the binary loss formula. Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses so that the loss is 0.5 when yj = 0. Also, the software calculates the mean binary loss for each class. • For a custom binary loss function, for example customFunction, specify its function handle 'BinaryLoss',@customFunction. customFunction has this form: bLoss = customFunction(M,s)

where: • M is the K-by-L coding matrix stored in Mdl.CodingMatrix. • s is the 1-by-L row vector of classification scores. • bLoss is the classification loss. This scalar aggregates the binary losses for every learner in a particular class. For example, you can use the mean binary loss to aggregate the loss over the learners for each class. • K is the number of classes. • L is the number of binary learners. 32-3347

32

Functions

For an example of passing a custom binary loss function, see “Predict Test-Sample Labels of ECOC Model Using Custom Binary Loss Function” on page 32-4169. The default BinaryLoss value depends on the score ranges returned by the binary learners. This table describes some default BinaryLoss values based on the given assumptions. Assumption

Default Value

All binary learners are SVMs or either linear or kernel classification models of SVM learners.

'hinge'

All binary learners are ensembles trained by AdaboostM1 or GentleBoost.

'exponential'

All binary learners are ensembles trained by LogitBoost.

'binodeviance'

All binary learners are linear or kernel classification models of logistic regression learners. Or, you specify to predict class posterior probabilities by setting 'FitPosterior',true in fitcecoc.

'quadratic'

To check the default value, use dot notation to display the BinaryLoss property of the trained model at the command line. Example: 'BinaryLoss','binodeviance' Data Types: char | string | function_handle Decoding — Decoding scheme 'lossweighted' (default) | 'lossbased' Decoding scheme that aggregates the binary losses, specified as the comma-separated pair consisting of 'Decoding' and 'lossweighted' or 'lossbased'. For more information, see “Binary Loss” on page 32-3349. Example: 'Decoding','lossbased' ObservationsIn — Predictor data observation dimension 'rows' (default) | 'columns' Predictor data observation dimension, specified as the comma-separated pair consisting of 'ObservationsIn' and 'columns' or 'rows'. Mdl.BinaryLearners must contain ClassificationLinear models. Note If you orient your predictor matrix so that observations correspond to columns and specify 'ObservationsIn','columns', you can experience a significant reduction in execution time. Options — Estimation options [] (default) | structure array returned by statset Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. To invoke parallel computing: • You need a Parallel Computing Toolbox license. 32-3348

margin

• Specify 'Options',statset('UseParallel',true). Verbose — Verbosity level 0 (default) | 1 Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and 0 or 1. Verbose controls the number of diagnostic messages that the software displays in the Command Window. If Verbose is 0, then the software does not display diagnostic messages. Otherwise, the software displays diagnostic messages. Example: 'Verbose',1 Data Types: single | double

Output Arguments m — Classification margins numeric column vector | numeric matrix Classification margins on page 32-3350, returned as a numeric column vector or numeric matrix. If Mdl.BinaryLearners contains ClassificationLinear models, then m is an n-by-L vector, where n is the number of observations in X and L is the number of regularization strengths in the linear classification models (numel(Mdl.BinaryLearners{1}.Lambda)). The value m(i,j) is the margin of observation i for the model trained using regularization strength Mdl.BinaryLearners{1}.Lambda(j). Otherwise, m is a column vector of length n.

More About Binary Loss A binary loss is a function of the class and classification score that determines how well a binary learner classifies an observation into the class. Suppose the following: • mkj is element (k,j) of the coding design matrix M (that is, the code corresponding to class k of binary learner j). • sj is the score of binary learner j for an observation. • g is the binary loss function. • k is the predicted class for the observation. In loss-based decoding [Escalera et al.] on page 18-189, the class producing the minimum sum of the binary losses over binary learners determines the predicted class of an observation, that is, k = argmin k

L



j=1

mk j g(mk j, s j) .

In loss-weighted decoding [Escalera et al.] on page 18-189, the class producing the minimum average of the binary losses over binary learners determines the predicted class of an observation, that is, 32-3349

32

Functions

L



k = argmin

j=1

k

mk j g(mk j, s j)



.

L j=1

mk j

Allwein et al. on page 18-189 suggest that loss-weighted decoding improves classification accuracy by keeping loss values for all classes in the same dynamic range. This table summarizes the supported loss functions, where yj is a class label for a particular binary learner (in the set {–1,1,0}), sj is the score for observation j, and g(yj,sj). Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 + exp(–2yjsj)]/ [2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 + exp(–yjsj)]/ [2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj – 1)]2/2

The software normalizes binary losses such that the loss is 0.5 when yj = 0, and aggregates using the average of the binary learners [Allwein et al.] on page 18-189. Do not confuse the binary loss with the overall classification loss (specified by the 'LossFun' namevalue pair argument of the loss and predict object functions), which measures how well an ECOC classifier performs as a whole. Classification Margin The classification margin is, for each observation, the difference between the negative loss for the true class and the maximal negative loss among the false classes. If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.

Tips • To compare the margins or edges of several ECOC classifiers, use template objects to specify a common score transform function among the classifiers during training.

References [1] Allwein, E., R. Schapire, and Y. Singer. “Reducing multiclass to binary: A unifying approach for margin classifiers.” Journal of Machine Learning Research. Vol. 1, 2000, pp. 113–141. [2] Escalera, S., O. Pujol, and P. Radeva. “On the decoding process in ternary error-correcting output codes.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 32, Issue 7, 2010, pp. 120–134. 32-3350

margin

[3] Escalera, S., O. Pujol, and P. Radeva. “Separability of ternary codes for sparse designs of errorcorrecting output codes.” Pattern Recogn. Vol. 30, Issue 3, 2009, pp. 285–297.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB). Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™. To run in parallel, set the 'UseParallel' option to true. Set the 'UseParallel' field of the options structure to true using statset and specify the 'Options' name-value pair argument in the call to this function. For example: 'Options',statset('UseParallel',true) For more information, see the 'Options' name-value pair argument. For more general information about parallel computing, see “Run MATLAB Functions with Automatic Parallel Support” (Parallel Computing Toolbox).

See Also ClassificationECOC | CompactClassificationECOC | edge | fitcecoc | loss | predict | resubMargin Topics “Quick Start Parallel Computing for Statistics and Machine Learning Toolbox” on page 30-2 “Reproducibility in Parallel Statistical Computations” on page 30-13 “Concepts of Parallel Computing in Statistics and Machine Learning Toolbox” on page 30-8 Introduced in R2014b

32-3351

32

Functions

margin Classification margins

Syntax M M M M

= = = =

margin(ens,tbl,ResponseVarName) margin(ens,tbl,Y) margin(ens,X,Y) margin( ___ Name,Value)

Description M = margin(ens,tbl,ResponseVarName) returns the classification margin for the predictions of ens on data tbl, when the true classifications are tbl.ResponseVarName. M = margin(ens,tbl,Y) returns the classification margin for the predictions of ens on data tbl, when the true classifications are Y. M = margin(ens,X,Y) returns the classification margin for the predictions of ens on data X, when the true classifications are Y. M = margin( ___ Name,Value) calculates margin with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes.

Input Arguments ens Classification ensemble created with fitcensemble, or a compact classification ensemble created with compact. tbl Sample data, specified as a table. Each row of tbl corresponds to one observation, and each column corresponds to one predictor variable. tbl must contain all of the predictors used to train the model. Multi-column variables and cell arrays other than cell arrays of character vectors are not allowed. If you trained ens using sample data contained in a table, then the input data for this method must also be in a table. ResponseVarName Response variable name, specified as the name of a variable in tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable Y is stored as tbl.Y, then specify it as 'Y'. Otherwise, the software treats all columns of tbl, including Y, as predictors when training the model. 32-3352

margin

X Matrix of data to classify. Each row of X represents one observation, and each column represents one predictor. X must have the same number of columns as the data used to train ens. X should have the same number of rows as the number of elements in Y. If you trained ens using sample data contained in a matrix, then the input data for this method must also be in a matrix. Y Class labels of observations in tbl or X. Y should be of the same type as the classification used to train ens, and its number of elements should equal the number of rows of tbl or X. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. learners Indices of weak learners in the ensemble ranging from 1 to ens.NumTrained. oobEdge uses only these learners for calculating loss. Default: 1:NumTrained UseObsForLearner A logical matrix of size N-by-T, where: • N is the number of rows of X. • T is the number of weak learners in ens. When UseObsForLearner(i,j) is true, learner j is used in predicting the class of row i of X. Default: true(N,T)

Output Arguments M A numeric column vector with the same number of rows as tbl or X. Each row of M gives the classification margin for that row of tbl or X.

Examples Find Classification Margin Find the margin for classifying an average flower from the fisheriris data as 'versicolor'. Load the Fisher iris data set. load fisheriris

32-3353

32

Functions

Train an ensemble of 100 boosted classification trees using AdaBoostM2. t = templateTree('MaxNumSplits',1); % Weak learner template tree object ens = fitcensemble(meas,species,'Method','AdaBoostM2','Learners',t);

Classify an average flower and find the classification margin. flower = mean(meas); predict(ens,flower) ans = 1x1 cell array {'versicolor'} margin(ens,flower,'versicolor') ans = 3.2140

More About Margin The classification margin is the difference between the classification score for the true class and maximal classification score for the false classes. Margin is a column vector with the same number of rows as in the matrix X. Score (ensemble) For ensembles, a classification score represents the confidence of a classification into a class. The higher the score, the higher the confidence. Different ensemble algorithms have different definitions for their scores. Furthermore, the range of scores depends on ensemble type. For example: • AdaBoostM1 scores range from –∞ to ∞. • Bag scores range from 0 to 1.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB).

See Also edge | loss | predict

32-3354

margin

margin Classification margins for naive Bayes classifiers

Syntax m = margin(Mdl,tbl,ResponseVarName) m = margin(Mdl,tbl,Y) m = margin(Mdl,X,Y)

Description m = margin(Mdl,tbl,ResponseVarName) returns the classification margins on page 32-3359 (m) for the trained naive Bayes classifier Mdl using the predictor data in table tbl and the class labels in tbl.ResponseVarName. m = margin(Mdl,tbl,Y) returns the classification margins (m) for the trained naive Bayes classifier Mdl using the predictor data in table tbl and the class labels in vector Y. m = margin(Mdl,X,Y) returns the classification margins (m) for the trained naive Bayes classifier Mdl using the predictor data X and class labels Y.

Input Arguments Mdl — Naive Bayes classifier ClassificationNaiveBayes model | CompactClassificationNaiveBayes model Naive Bayes classifier, specified as a ClassificationNaiveBayes model or CompactClassificationNaiveBayes model returned by fitcnb or compact, respectively. tbl — Sample data table Sample data, specified as a table. Each row of tbl corresponds to one observation, and each column corresponds to one predictor variable. Optionally, tbl can contain additional columns for the response variable and observation weights. tbl must contain all the predictors used to train Mdl. Multi-column variables and cell arrays other than cell arrays of character vectors are not allowed. If you trained Mdl using sample data contained in a table, then the input data for this method must also be in a table. Data Types: table ResponseVarName — Response variable name name of a variable in tbl Response variable name, specified as the name of a variable in tbl. You must specify ResponseVarName as a character vector or string scalar. For example, if the response variable y is stored as tbl.y, then specify it as 'y'. Otherwise, the software treats all columns of tbl, including y, as predictors when training the model. 32-3355

32

Functions

The response variable must be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array. Data Types: char | string X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature). The variables making up the columns of X should be the same as the variables that trained Mdl. The length of Y and the number of rows of X must be equal. Data Types: double | single Y — Class labels categorical array | character array | string array | logical vector | vector of numeric values | cell array of character vectors Class labels, specified as a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. Y must be the same as the data type of Mdl.ClassNames. (The software treats string arrays as cell arrays of character vectors.) The length of Y and the number of rows of tbl or X must be equal. Data Types: categorical | char | string | logical | single | double | cell

Output Arguments m — Classification margins numeric vector Classification margins on page 32-3359, returned as a numeric vector. m has the same length equal to size(X,1). Each entry of m is the classification margin of the corresponding observation (row) of X and element of Y.

Examples Estimate Test-Sample Classification Margins of Naive Bayes Classifiers Load Fisher's iris data set. load fisheriris X = meas; % Predictors Y = species; % Response rng(1);

Train a naive Bayes classifier. Specify a 30% holdout sample for testing. It is good practice to specify the class order. Assume that each predictor is conditionally normally distributed given its label. 32-3356

margin

CVMdl = fitcnb(X,Y,'Holdout',0.30,... 'ClassNames',{'setosa','versicolor','virginica'}); CMdl = CVMdl.Trained{1}; ... % Extract the trained, compact classifier testInds = test(CVMdl.Partition); % Extract the test indices XTest = X(testInds,:); YTest = Y(testInds);

CVMdl is a ClassificationPartitionedModel classifier. It contains the property Trained, which is a 1-by-1 cell array holding a CompactClassificationNaiveBayes classifier that the software trained using the training set. Estimate the test sample classification margins. Display the distribution of the margins using a boxplot. m = margin(CMdl,XTest,YTest); figure; boxplot(m); title 'Distribution of the Test-Sample Margins';

An observation margin is the observed true class score minus the maximum false class score among all scores in the respective class. Classifiers that yield relatively large margins are desirable.

32-3357

32

Functions

Select Naive Bayes Classifier Features by Examining Test Sample Margins The classifier margins measure, for each observation, the difference between the true class observed score and the maximal false class score for a particular class. One way to perform feature selection is to compare test sample margins from multiple models. Based solely on this criterion, the model with the highest margins is the best model. Load Fisher's iris data set. load fisheriris X = meas; % Predictors Y = species; % Response rng(1);

Partition the data set into training and test sets. Specify a 30% holdout sample for testing. Partition = cvpartition(Y,'Holdout',0.30); testInds = test(Partition); % Indices for the test set XTest = X(testInds,:); YTest = Y(testInds);

Partition defines the data set partition. Define these two data sets: • fullX contains all predictors. • partX contains the last 2 predictors. fullX = X; partX = X(:,3:4);

Train naive Bayes classifiers for each predictor set. Specify the partition definition. FCVMdl = fitcnb(fullX,Y,'CVPartition',Partition); PCVMdl = fitcnb(partX,Y,'CVPartition',Partition); FCMdl = FCVMdl.Trained{1}; PCMdl = PCVMdl.Trained{1};

FullCVMdl and PartCVMdl are ClassificationPartitionedModel classifiers. They contain the property Trained, which is a 1-by-1 cell array holding a CompactClassificationNaiveBayes classifier that the software trained using the training set. Estimate the test sample margins for each classifier. Display the distributions of the margins for each model using boxplots. fullM = margin(FCMdl,XTest,YTest); partM = margin(PCMdl,XTest(:,3:4),YTest); figure; boxplot([fullM partM],'Labels',{'All Predictors','Two Predictors'}) h = gca; h.YLim = [0.98 1.01]; % Modify axis to see boxes. title 'Boxplots of Test-Sample Margins';

32-3358

margin

The margins have a similar distribution, but PCMdl is less complex.

More About Classification Edge The classification edge is the weighted mean of the classification margins. If you supply weights, then the software normalizes them to sum to the prior probability of their respective class. The software uses the normalized weights to compute the weighted mean. One way to choose among multiple classifiers, e.g., to perform feature selection, is to choose the classifier that yields the highest edge. Classification Margin The classification margins are, for each observation, the difference between the score for the true class and maximal score for the false classes. Provided that they are on the same scale, margins serve as a classification confidence measure, i.e., among multiple classifiers, those that yield larger margins are better. Posterior Probability The posterior probability is the probability that an observation belongs in a particular class, given the data. 32-3359

32

Functions

For naive Bayes, the posterior probability that a classification is k for a given observation (x1,...,xP) is P Y = k x1, .., xP =

P X1, ..., XP y = k π Y = k , P X1, ..., XP

where: • P X1, ..., XP y = k is the conditional joint density of the predictors given they are in class k. Mdl.DistributionNames stores the distribution names of the predictors. • π(Y = k) is the class prior probability distribution. Mdl.Prior stores the prior distribution. • P X1, .., XP is the joint density of the predictors. The classes are discrete, so P(X1, ..., XP) =

K



k=1

P(X1, ..., XP y = k)π(Y = k) .

Prior Probability The prior probability of a class is the believed relative frequency with which observations from that class occur in a population. Score The naive Bayes score is the class posterior probability given the observation.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB).

See Also ClassificationNaiveBayes | CompactClassificationNaiveBayes | edge | fitcnb | loss | predict Topics “Naive Bayes Classification” on page 21-2

32-3360

margin

margin Package: classreg.learning.classif Find classification margins for support vector machine (SVM) classifier

Syntax m = margin(SVMModel,TBL,ResponseVarName) m = margin(SVMModel,TBL,Y) m = margin(SVMModel,X,Y)

Description m = margin(SVMModel,TBL,ResponseVarName) returns the classification margins on page 323364 (m) for the trained support vector machine (SVM) classifier SVMModel using the sample data in table TBL and the class labels in TBL.ResponseVarName. m is returned as a numeric vector with the same length as Y. The software estimates each entry of m using the trained SVM classifier SVMModel, the corresponding row of X, and the true class label Y. m = margin(SVMModel,TBL,Y) returns the classification margins (m) for the trained SVM classifier SVMModel using the sample data in table TBL and the class labels in Y. m = margin(SVMModel,X,Y) returns the classification margins for SVMModel using the predictor data in matrix X and the class labels in Y.

Examples Estimate Test Sample Classification Margins of SVM Classifiers Load the ionosphere data set. load ionosphere rng(1); % For reproducibility

Train an SVM classifier. Specify a 15% holdout sample for testing, standardize the data, and specify that 'g' is the positive class. CVSVMModel = fitcsvm(X,Y,'Holdout',0.15,'ClassNames',{'b','g'},... 'Standardize',true); CompactSVMModel = CVSVMModel.Trained{1}; ... % Extract the trained, compact classifier testInds = test(CVSVMModel.Partition); % Extract the test indices XTest = X(testInds,:); YTest = Y(testInds,:);

CVSVMModel is a ClassificationPartitionedModel classifier. It contains the property Trained, which is a 1-by-1 cell array holding a CompactClassificationSVM classifier that the software trained using the training set. 32-3361

32

Functions

Estimate the test sample classification margins. m = margin(CompactSVMModel,XTest,YTest); m(10:20) ans = 11×1 3.5461 5.5939 4.9948 4.5611 -4.7963 5.5127 -2.8776 1.8673 9.4986 9.5018 ⋮

An observation margin is the observed true class score minus the maximum false class score among all scores in the respective class. Classifiers that yield relatively large margins are preferred.

Select SVM Classifier Features by Examining Test Sample Margins Perform feature selection by comparing test sample margins from multiple models. Based solely on this comparison, the model with the highest margins is the best model. Load the ionosphere data set. load ionosphere rng(1); % For reproducibility

Partition the data set into training and test sets. Specify a 15% holdout sample for testing. Partition = cvpartition(Y,'Holdout',0.15); testInds = test(Partition); % Indices for the test set XTest = X(testInds,:); YTest = Y(testInds,:);

Partition defines the data set partition. Define these two data sets: • fullX contains all predictors (except the removed column of 0s). • partX contains the last 20 predictors. fullX = X; partX = X(:,end-20:end);

Train SVM classifiers for each predictor set. Specify the partition definition. FullCVSVMModel = fitcsvm(fullX,Y,'CVPartition',Partition); PartCVSVMModel = fitcsvm(partX,Y,'CVPartition',Partition); FCSVMModel = FullCVSVMModel.Trained{1}; PCSVMModel = PartCVSVMModel.Trained{1};

32-3362

margin

FullCVSVMModel and PartCVSVMModel are ClassificationPartitionedModel classifiers. They contain the property Trained, which is a 1-by-1 cell array holding a CompactClassificationSVM classifier that the software trained using the training set. Estimate the test sample margins for each classifier. fullM = margin(FCSVMModel,XTest,YTest); partM = margin(PCSVMModel,XTest(:,end-20:end),YTest); n = size(XTest,1); p = sum(fullM < partM)/n p = 0.2500

Approximately 25% of the margins from the full model are less than those from the model with fewer predictors. This result suggests that the model trained with all the predictors is better.

Input Arguments SVMModel — SVM classification model ClassificationSVM model object | CompactClassificationSVM model object SVM classification model, specified as a ClassificationSVM model object or CompactClassificationSVM model object returned by fitcsvm or compact, respectively. TBL — Sample data table Sample data, specified as a table. Each row of TBL corresponds to one observation, and each column corresponds to one predictor variable. Optionally, TBL can contain additional columns for the response variable and observation weights. TBL must contain all of the predictors used to train SVMModel. Multicolumn variables and cell arrays other than cell arrays of character vectors are not allowed. If TBL contains the response variable used to train SVMModel, then you do not need to specify ResponseVarName or Y. If you trained SVMModel using sample data contained in a table, then the input data for margin must also be in a table. If you set 'Standardize',true in fitcsvm when training SVMModel, then the software standardizes the columns of the predictor data using the corresponding means in SVMModel.Mu and the standard deviations in SVMModel.Sigma. Data Types: table X — Predictor data numeric matrix Predictor data, specified as a numeric matrix. Each row of X corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature). The variables in the columns of X must be the same as the variables that trained the SVMModel classifier. The length of Y and the number of rows in X must be equal. 32-3363

32

Functions

If you set 'Standardize',true in fitcsvm to train SVMModel, then the software standardizes the columns of X using the corresponding means in SVMModel.Mu and the standard deviations in SVMModel.Sigma. Data Types: double | single ResponseVarName — Response variable name name of variable in TBL Response variable name, specified as the name of a variable in TBL. If TBL contains the response variable used to train SVMModel, then you do not need to specify ResponseVarName. If you specify ResponseVarName, then you must do so as a character vector or string scalar. For example, if the response variable is stored as TBL.Response, then specify ResponseVarName as 'Response'. Otherwise, the software treats all columns of TBL, including TBL.Response, as predictors. The response variable must be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array. Data Types: char | string Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels, specified as a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. Y must be the same as the data type of SVMModel.ClassNames. (The software treats string arrays as cell arrays of character vectors.) The length of Y must equal the number of rows in TBL or the number of rows in X.

More About Classification Edge The edge is the weighted mean of the classification margins. The weights are the prior class probabilities. If you supply weights, then the software normalizes them to sum to the prior probabilities in the respective classes. The software uses the renormalized weights to compute the weighted mean. One way to choose among multiple classifiers, for example, to perform feature selection, is to choose the classifier that yields the highest edge. Classification Margin The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class. The software defines the classification margin for binary classification as m = 2yf x . 32-3364

margin

x is an observation. If the true label of x is the positive class, then y is 1, and –1 otherwise. f(x) is the positive-class classification score for the observation x. The classification margin is commonly defined as m = yf(x). If the margins are on the same scale, then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better. Classification Score The SVM classification score for classifying observation x is the signed distance from x to the decision boundary ranging from -∞ to +∞. A positive score for a class indicates that x is predicted to be in that class. A negative score indicates otherwise. The positive class classification score f (x) is the trained SVM classification function. f (x) is also the numerical, predicted response for x, or the score for predicting x into the positive class. f (x) =

n



j=1

α j y jG(x j, x) + b,

where (α1, ..., αn, b) are the estimated SVM parameters, G(x j, x) is the dot product in the predictor space between x and the support vectors, and the sum includes the training set observations. The negative class classification score for x, or the score for predicting x into the negative class, is –f(x). If G(xj,x) = xj′x (the linear kernel), then the score function reduces to f x = x/s ′β + b . s is the kernel scale and β is the vector of fitted linear coefficients. For more details, see “Understanding Support Vector Machines” on page 18-145.

Algorithms For binary classification, the software defines the margin for observation j, mj, as m j = 2y j f (x j), where yj ∊ {-1,1}, and f(xj) is the predicted score of observation j for the positive class. However, mj = yjf(xj) is commonly used to define the margin.

References [1] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB). 32-3365

32

Functions

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also ClassificationSVM | CompactClassificationSVM | edge | fitcsvm | loss | predict Introduced in R2014a

32-3366

margin

margin Classification margins

Syntax m = margin(tree,TBL,ResponseVarName) m = margin(tree,TBL,Y) m = margin(tree,X,Y)

Description m = margin(tree,TBL,ResponseVarName) returns the classification margins for the table of predictors TBL and class labels TBL.ResponseVarName. For the definition, see “Margin” on page 323369. m = margin(tree,TBL,Y) returns the classification margins for the table of predictors TBL and class labels Y. m = margin(tree,X,Y) returns the classification margins for the matrix of predictors X and class labels Y.

Input Arguments tree — Trained classification tree ClassificationTree model object | CompactClassificationTree model object Trained classification tree, specified as a ClassificationTree or CompactClassificationTree model object. That is, tree is a trained classification model returned by fitctree or compact. TBL — Sample data table Sample data, specified as a table. Each row of TBL corresponds to one observation, and each column corresponds to one predictor variable. Optionally, TBL can contain additional columns for the response variable and observation weights. TBL must contain all the predictors used to train tree. Multi-column variables and cell arrays other than cell arrays of character vectors are not allowed. If TBL contains the response variable used to train tree, then you do not need to specify ResponseVarName or Y. If you train tree using sample data contained in a table, then the input data for this method must also be in a table. Data Types: table X — Data to classify numeric matrix Data to classify, specified as a numeric matrix. Each row of X represents one observation, and each column represents one predictor. X must have the same number of columns as the data used to train tree. X must have the same number of rows as the number of elements in Y. 32-3367

32

Functions

Data Types: single | double ResponseVarName — Response variable name name of a variable in TBL Response variable name, specified as the name of a variable in TBL. If TBL contains the response variable used to train tree, then you do not need to specify ResponseVarName. If you specify ResponseVarName, then you must do so as a character vector or string scalar. For example, if the response variable is stored as TBL.Response, then specify it as 'Response'. Otherwise, the software treats all columns of TBL, including TBL.ResponseVarName, as predictors. The response variable must be a categorical, character, or string array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array. Data Types: char | string Y — Class labels categorical array | character array | string array | logical vector | numeric vector | cell array of character vectors Class labels, specified as a categorical, character, or string array, a logical or numeric vector, or a cell array of character vectors. Y must be of the same type as the classification used to train tree, and its number of elements must equal the number of rows of X. Data Types: categorical | char | string | logical | single | double | cell

Output Arguments m — Margin numeric column vector Margin, returned as a numeric column vector of length size(X,1). Each entry in m represents the margin for the corresponding rows of X and (true class) Y, computed using tree.

Examples Compute the classification margin for the Fisher iris data, trained on its first two columns of data, and view the last 10 entries. load fisheriris X = meas(:,1:2); tree = fitctree(X,species); M = margin(tree,X,species); M(end-10:end) ans = 0.1111 0.1111 0.1111 -0.2857 0.6364 0.6364 0.1111

32-3368

margin

0.7500 1.0000 0.6364 0.2000

The classification tree trained on all the data is better. tree = fitctree(meas,species); M = margin(tree,meas,species); M(end-10:end) ans = 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565

More About Margin The classification margin is the difference between the classification score for the true class and maximal classification score for the false classes. Margin is a column vector with the same number of rows as in the matrix X. Score (tree) For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node. For example, consider classifying a predictor X as true when X < 0.15 or X > 0.95, and X is false otherwise. Generate 100 random points and classify them: rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')

32-3369

32

Functions

Prune the tree: tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')

32-3370

margin

The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations from .15 to .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false. Compute the prediction scores for the first 10 rows of X: [~,score] = predict(tree1,X(1:10)); [score X(1:10,:)] ans = 10×3 0.9059 0.9059 0 0.9059 0.9059 0 0.9059 0.9059 0.9059

0.0941 0.0941 1.0000 0.0941 0.0941 1.0000 0.0941 0.0941 0.0941

0.8147 0.9058 0.1270 0.9134 0.6324 0.0975 0.2785 0.5469 0.9575

32-3371

32

Functions

0.9059

0.0941

0.9649

Indeed, every value of X (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.91 and 0.09. The difference (score 0.09 instead of the expected .06) is due to a statistical fluctuation: there are 8 observations in X in the range (.95,1) instead of the expected 5 observations.

Extended Capabilities Tall Arrays Calculate with arrays that have more rows than fit in memory. This function fully supports tall arrays. For more information, see “Tall Arrays” (MATLAB).

See Also edge | fitctree | loss | predict

32-3372

margin

margin Class: CompactTreeBagger Classification margin

Syntax mar mar mar mar

= = = =

margin(B,TBLnew,Ynew) margin(B,Xnew,Ynew) margin(B,TBLnew,Ynew,'param1',val1,'param2',val2,...) margin(B,Xnew,Ynew,'param1',val1,'param2',val2,...)

Description mar = margin(B,TBLnew,Ynew) computes the classification margins for the predictors contained in the table TBLnew given true response Ynew. You can omit Ynew if TBLnew contains the response variable. If you trained B using sample data contained in a table, then the input data for this method must also be in a table. mar = margin(B,Xnew,Ynew) computes the classification margins for the predictors contained in the matrix Xnew given true response Ynew. Ynew can be a numeric vector, character matrix, string array, cell array of character vectors, categorical vector or logical vector. mar is a numeric array of size Nobs-by-NTrees, where Nobs is the number of rows of TBLnew and Ynew, and NTrees is the number of trees in the ensemble B. For observation I and tree J, mar(I,J) is the difference between the score for the true class and the largest score for other classes. This method is available for classification ensembles only. mar = margin(B,TBLnew,Ynew,'param1',val1,'param2',val2,...) or mar = margin(B,Xnew,Ynew,'param1',val1,'param2',val2,...) specifies optional parameter name-value pairs: 'Mode'

How the method computes errors. If set to 'cumulative' (default), margin computes cumulative errors and mar is an Nobs-by-NTrees matrix, where the first column gives error from trees(1), second column gives error fromtrees(1:2) etc., up to trees(1:NTrees). If set to 'individual', mar is a Nobs-by-NTrees matrix, where each element is an error from each tree in the ensemble. If set to 'ensemble', mar a single column of length Nobs showing the cumulative margins for the entire ensemble.

'Trees'

Vector of indices indicating what trees to include in this calculation. By default, this argument is set to 'all' and the method uses all trees. If 'Trees' is a numeric vector, the method returns a vector of length NTrees for 'cumulative' and 'individual' modes, where NTrees is the number of elements in the input vector, and a scalar for 'ensemble' mode. For example, in the 'cumulative' mode, the first element gives error from trees(1), the second element gives error from trees(1:2) etc.

32-3373

32

Functions

'TreeWeights'

Vector of tree weights. This vector must have the same length as the 'Trees' vector. The method uses these weights to combine output from the specified trees by taking a weighted average instead of the simple non-weighted majority vote. You cannot use this argument in the 'individual' mode.

'UseInstanceForT Logical matrix of size Nobs-by-NTrees indicating which trees should be used ree' to make predictions for each observation. By default the method uses all trees for all observations.

See Also margin

32-3374

margin

margin Class: TreeBagger Classification margin

Syntax mar mar mar mar

= = = =

margin(B,TBLnew,Ynew) margin(B,Xnew,Ynew) margin(B,TBLnew,Ynew,'param1',val1,'param2',val2,...) margin(B,Xnew,Ynew,'param1',val1,'param2',val2,...)

Description mar = margin(B,TBLnew,Ynew) computes the classification margins for the predictors contained in the table TBLnew given true response Ynew. You can omit Ynew if TBLnew contains the response variable. If you trained B using sample data contained in a table, then the input data for this method must also be in a table. mar = margin(B,Xnew,Ynew) computes the classification margins for the predictors contained in the matrix Xnew given true response Ynew. If you trained B using sample data contained in a matrix, then the input data for this method must also be in a matrix. Ynew can be a numeric vector, character matrix, string array, cell array of character vectors, categorical vector or logical vector. mar is a numeric array of size Nobs-by-NTrees, where Nobs is the number of rows of TBLnew and Ynew, and NTrees is the number of trees in the ensemble B. For observation I and tree J, mar(I,J) is the difference between the score for the true class and the largest score for other classes. This method is available for classification ensembles only. mar = margin(B,TBLnew,Ynew,'param1',val1,'param2',val2,...) or mar = margin(B,Xnew,Ynew,'param1',val1,'param2',val2,...) specifies optional parameter name-value pairs: 'Mode'

Character vector or string scalar indicating how the method computes errors. If set to 'cumulative' (default), margin computes cumulative errors and mar is an Nobs-by-NTrees matrix, where the first column gives error from trees(1), second column gives error fromtrees(1:2) etc., up to trees(1:NTrees). If set to 'individual', mar is a Nobs-by-NTrees matrix, where each element is an error from each tree in the ensemble. If set to 'ensemble', mar a single column of length Nobs showing the cumulative margins for the entire ensemble.

'Trees'

Vector of indices indicating what trees to include in this calculation. By default, this argument is set to 'all' and the method uses all trees. If 'Trees' is a numeric vector, the method returns a vector of length NTrees for 'cumulative' and 'individual' modes, where NTrees is the number of elements in the input vector, and a scalar for 'ensemble' mode. For example, in the 'cumulative' mode, the first element gives error from trees(1), the second element gives error from trees(1:2) etc.

32-3375

32

Functions

'TreeWeights'

Vector of tree weights. This vector must have the same length as the 'Trees' vector. The method uses these weights to combine output from the specified trees by taking a weighted average instead of the simple non-weighted majority vote. You cannot use this argument in the 'individual' mode.

'UseInstanceForT Logical matrix of size Nobs-by-NTrees indicating which trees should be used ree' to make predictions for each observation. By default the method uses all trees for all observations.

See Also margin

32-3376

margmean

margmean Class: RepeatedMeasuresModel Estimate marginal means

Syntax tbl = margmean(rm,vars) tbl = margmean(rm,vars,'alpha',alpha)

Description tbl = margmean(rm,vars) returns the estimated marginal means for the variables vars, in the table tbl. tbl = margmean(rm,vars,'alpha',alpha) returns the 100*(1–alpha)% confidence intervals for the marginal means.

Input Arguments rm — Repeated measures model RepeatedMeasuresModel object Repeated measures model, returned as a RepeatedMeasuresModel object. For properties and methods of this object, see RepeatedMeasuresModel. vars — Variables for which to compute the marginal means character vector | string scalar | string array | cell array of character vectors Variables for which to compute the marginal means, specified as a character vector or string scalar representing the name of a between or within-subjects factor in rm, or a string array or cell array of character vectors representing the names of multiple variables. Each between-subjects factor must be categorical. For example, if you want to compute the marginal means for the variables Drug and Gender, then you can specify as follows. Example: {'Drug','Gender'} Data Types: char | string | cell alpha — Significance level 0.05 (default) | scalar value in the range of 0 to 1 Significance level of the confidence intervals for population marginal means, specified as a scalar value in the range of 0 to 1. The confidence level is 100*(1–alpha)%. For example, you can specify a 99% confidence level as follows. Example: 'alpha',0.01 32-3377

32

Functions

Data Types: double | single

Output Arguments tbl — Estimated marginal means table Estimated marginal means, returned as a table. tbl contains one row for each combination of the groups of the variables you specify in vars, one column for each variable, and the following columns. Column name

Description

Mean

Estimated marginal means

StdErr

Standard errors of the estimates

Lower

Lower limit of a 95% confidence interval for the true population mean

Upper

Upper limit of a 95% confidence interval for the true population mean

Examples Compute Marginal Means Grouped by Two Factors Load the sample data. load repeatedmeas

The table between includes the between-subject variables age, IQ, group, gender, and eight repeated measures y1 to y8 as responses. The table within includes the within-subject variables w1 and w2. This is simulated data. Fit a repeated measures model, where the repeated measures y1 to y8 are the responses, and age, IQ, group, gender, and the group-gender interaction are the predictor variables. Also specify the within-subject design matrix. rm = fitrm(between,'y1-y8 ~ Group*Gender + Age + IQ','WithinDesign',within);

Compute the marginal means grouped by the factors Group and Gender. M = margmean(rm,{'Group' 'Gender'}) M=6×6 table Group _____ A A B B C C

Gender ______

Mean _______

StdErr ______

Lower ________

Upper _______

Female Male Female Male Female Male

15.946 8.0726 11.758 2.2858 -8.6183 -13.551

5.6153 5.7236 5.7091 5.6748 5.871 5.7283

4.3009 -3.7973 -0.08189 -9.483 -20.794 -25.431

27.592 19.943 23.598 14.055 3.5574 -1.6712

Display the description for table M. M.Properties.Description

32-3378

margmean

ans = 'Estimated marginal means Means computed with Age=13.7, IQ=98.2667'

Compute Estimated Marginal Means and Confidence Intervals Load the sample data. load fisheriris

The column vector, species, consists of iris flowers of three different species, setosa, versicolor, virginica. The double matrix meas consists of four types of measurements on the flowers, the length and width of sepals and petals in centimeters, respectively. Store the data in a table array. t = table(species,meas(:,1),meas(:,2),meas(:,3),meas(:,4),... 'VariableNames',{'species','meas1','meas2','meas3','meas4'}); Meas = dataset([1 2 3 4]','VarNames',{'Measurements'});

Fit a repeated measures model, where the measurements are the responses and the species is the predictor variable. rm = fitrm(t,'meas1-meas4~species','WithinDesign',Meas);

Compute the marginal means grouped by the factor species. margmean(rm,'species') ans=3×5 table species ______________

Mean ______

StdErr ________

Lower ______

Upper ______

{'setosa' } {'versicolor'} {'virginica' }

2.5355 3.573 4.285

0.042807 0.042807 0.042807

2.4509 3.4884 4.2004

2.6201 3.6576 4.3696

StdError field shows the standard errors of the estimated marginal means. The Lower and Upper fields show the lower and upper bounds for the 95% confidence intervals of the group marginal means, respectively. None of the confidence intervals overlap, which indicates that marginal means differ with species. You can also plot the estimated marginal means using the plotprofile method. Compute the 99% confidence intervals for the marginal means. margmean(rm,'species','alpha',0.01) ans=3×5 table species ______________

Mean ______

StdErr ________

Lower ______

Upper ______

{'setosa' } {'versicolor'} {'virginica' }

2.5355 3.573 4.285

0.042807 0.042807 0.042807

2.4238 3.4613 4.1733

2.6472 3.6847 4.3967

32-3379

32

Functions

See Also multcompare | fitrm | plotprofile

32-3380

mauchly

mauchly Class: RepeatedMeasuresModel Mauchly’s test for sphericity

Syntax tbl = mauchly(rm) tbl = mauchly(rm,C)

Description tbl = mauchly(rm) returns the result of the Mauchly’s test for sphericity for the repeated measures model rm. It tests the null hypothesis that the sphericity assumption is true for the response variables in rm. For more information, see “Mauchly’s Test of Sphericity” on page 9-57. tbl = mauchly(rm,C) returns the result of the Mauchly’s test based on the contrast matrix C.

Input Arguments rm — Repeated measures model RepeatedMeasuresModel object Repeated measures model, returned as a RepeatedMeasuresModel object. For properties and methods of this object, see RepeatedMeasuresModel. C — Contrasts matrix Contrasts, specified as a matrix. The default value of C is the Q factor in a QR decomposition of the matrix M, where M is defined so that Y*M is the difference between all successive pairs of columns of the repeated measures matrix Y. Data Types: single | double

Output Arguments tbl — Results of Mauchly’s test of sphericity table Results of Mauchly’s test for sphericity for the repeated measures model rm, returned as a table. tbl contains the following columns. 32-3381

32

Functions

Column Name

Definition

W

Value of Mauchly’s W statistic

ChiStat

Chi-square statistic value

DF

Degrees of freedom of the Chi-square statistic

pValue

p-value corresponding to the Chi-square statistic

Data Types: table

Examples Perform Mauchly’s Test Load the sample data. load fisheriris

The column vector species consists of iris flowers of three different species: setosa, versicolor, and virginica. The double matrix meas consists of four types of measurements on the flowers: the length and width of sepals and petals in centimeters, respectively. Store the data in a table array. t = table(species,meas(:,1),meas(:,2),meas(:,3),meas(:,4),... 'VariableNames',{'species','meas1','meas2','meas3','meas4'}); Meas = dataset([1 2 3 4]','VarNames',{'Measurements'});

Fit a repeated measures model, where the measurements are the responses and the species is the predictor variable. rm = fitrm(t,'meas1-meas4~species','WithinDesign',Meas);

Perform Mauchly’s test to assess the sphericity assumption. mauchly(rm) ans=1×4 table W _______ 0.55814

ChiStat _______

DF __

pValue __________

84.976

5

7.6149e-17

The small p-value (in the pValue field) indicates that the sphericity, hence the compound symmetry assumption, does not hold. You should use epsilon corrections to compute the p-values for a repeated measures anova. You can compute the epsilon corrections using the epsilon method and perform the repeated measures anova with the corrected p-values using the ranova method.

See Also epsilon | fitrm | ranova Topics “Mauchly’s Test of Sphericity” on page 9-57 32-3382

mauchly

“Compound Symmetry Assumption and Epsilon Corrections” on page 9-55

32-3383

32

Functions

mat2dataset (Not Recommended) Convert matrix to dataset array Note The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Syntax ds = mat2dataset(X) ds = mat2dataset(X,Name,Value)

Description ds = mat2dataset(X) converts a matrix to a dataset array. ds = mat2dataset(X,Name,Value) performs the conversion using additional options specified by one or more Name,Value pair arguments.

Examples Convert Matrix to Dataset Array Convert a matrix to a dataset array using the default options. Load sample data. load('fisheriris') X = meas; size(X) ans = 1×2 150

4

Convert the matrix to a dataset array. ds = mat2dataset(X); size(ds) ans = 1×2 150

4

ds(1:5,:) ans = X1 5.1

32-3384

X2 3.5

X3 1.4

X4 0.2

mat2dataset

4.9 4.7 4.6 5

3 3.2 3.1 3.6

1.4 1.3 1.5 1.4

0.2 0.2 0.2 0.2

When you do not specify variable names, mat2dataset uses the matrix name and column numbers to create default variable names.

Convert Matrix to Dataset Array with Variable Names Load sample data. load('fisheriris') X = meas; size(X) ans = 1×2 150

4

Convert the matrix to a dataset array, providing a variable name for each of the four column of X. ds = mat2dataset(X,'VarNames',{'SLength',... 'SWidth','PLength','PWidth'}); size(ds) ans = 1×2 150

4

ds(1:5,:) ans = SLength 5.1 4.9 4.7 4.6 5

SWidth 3.5 3 3.2 3.1 3.6

PLength 1.4 1.4 1.3 1.5 1.4

PWidth 0.2 0.2 0.2 0.2 0.2

Create a Dataset Array with Multicolumn Variables Convert a matrix to a dataset array containing multicolumn variables. Load sample data. load('fisheriris') X = meas; size(X)

32-3385

32

Functions

ans = 1×2 150

4

Convert the matrix to a dataset array, combining the sepal measurements (the first two columns) into one variable named SepalMeas, and the petal measurements (third and fourth columns) into one variable names PetalMeas. ds = mat2dataset(X,'NumCols',[2,2],... 'VarNames',{'SepalMeas','PetalMeas'}); ds(1:5,:) ans = SepalMeas 5.1 4.9 4.7 4.6 5

3.5 3 3.2 3.1 3.6

PetalMeas 1.4 1.4 1.3 1.5 1.4

0.2 0.2 0.2 0.2 0.2

The output dataset array has 150 observations and 2 variables. size(ds) ans = 1×2 150

2

Input Arguments X — Input matrix matrix Input matrix to convert to a dataset array, specified as an M-by-N numeric matrix. Each column of X becomes a variable in the output M-by-N dataset array. Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'NumCols',[1,1,2,1] specifies that the 3rd and 4th columns of the input matrix should be combined into a single variable. VarNames — Variable names for output dataset array string array | cell array of character vectors Variable names for the output dataset array, specified as the comma-separated pair consisting of 'VarNames' and a string array or cell array of character vectors. You must provide a variable name for each variable in ds. The names must be valid MATLAB identifiers, and must be unique. 32-3386

mat2dataset

Example: 'VarNames',{'myVar1','myVar2','myVar3'} ObsNames — Observation names for output dataset array string array | cell array of character vectors Observation names for the output dataset array, specified as the comma-separated pair consisting of 'ObsNames' and a string array or cell array of character vectors. The names do not need to be valid MATLAB identifiers, but they must be unique. NumCols — Number of columns for each variable vector of nonnegative integers Number of columns for each variable in ds, specified as the comma-separated pair consisting of 'NumCols' and a vector of nonnegative integers. When the number of columns for a variable is greater than one, mat2dataset combines multiple columns in X into a single variable in ds. The vector you assign to NumCols must sum to size(X,2). For example, to convert a matrix with eight columns into a dataset array with five variables, specify a vector with five elements that sum to eight, such as 'NumCols',[1,1,3,1,2].

Output Arguments ds — Output dataset array dataset array Output dataset array, returned by default with a variable for each column of X, and an observation for each row of X. If you specify NumCols, then the number of variables in ds is equal to the length of the specified vector of column numbers.

See Also cell2dataset | dataset | struct2dataset Topics “Create a Dataset Array from Workspace Variables” on page 2-57 “Create a Dataset Array from a File” on page 2-62 “Dataset Arrays” on page 2-112 Introduced in R2012b

32-3387

32

Functions

mdscale Nonclassical multidimensional scaling

Syntax Y = mdscale(D,p) [Y,stress] = mdscale(D,p) [Y,stress,disparities] = mdscale(D,p) [...] = mdscale(D,p,'Name',value)

Description Y = mdscale(D,p) performs nonmetric multidimensional scaling on the n-by-n dissimilarity matrix D, and returns Y, a configuration of n points (rows) in p dimensions (columns). The Euclidean distances between points in Y approximate a monotonic transformation of the corresponding dissimilarities in D. By default, mdscale uses Kruskal's normalized stress1 criterion. You can specify D as either a full n-by-n matrix, or in upper triangle form such as is output by pdist. A full dissimilarity matrix must be real and symmetric, and have zeros along the diagonal and nonnegative elements everywhere else. A dissimilarity matrix in upper triangle form must have real, nonnegative entries. mdscale treats NaNs in D as missing values, and ignores those elements. Inf is not accepted. You can also specify D as a full similarity matrix, with ones along the diagonal and all other elements less than one. mdscale transforms a similarity matrix to a dissimilarity matrix in such a way that distances between the points returned in Y approximate sqrt(1-D). To use a different transformation, transform the similarities prior to calling mdscale. [Y,stress] = mdscale(D,p) returns the minimized stress, i.e., the stress evaluated at Y. [Y,stress,disparities] = mdscale(D,p) returns the disparities, that is, the monotonic transformation of the dissimilarities D. [...] = mdscale(D,p,'Name',value) specifies one or more optional parameter name/value pairs that control further details of mdscale. Specify Name in single quotes. Available parameters are • Criterion— The goodness-of-fit criterion to minimize. This also determines the type of scaling, either non-metric or metric, that mdscale performs. Choices for non-metric scaling are: • 'stress' — Stress normalized by the sum of squares of the inter-point distances, also known as stress1. This is the default. • 'sstress' — Squared stress, normalized with the sum of 4th powers of the inter-point distances. Choices for metric scaling are: • 'metricstress' — Stress, normalized with the sum of squares of the dissimilarities. • 'metricsstress' — Squared stress, normalized with the sum of 4th powers of the dissimilarities. 32-3388

mdscale

• 'sammon' — Sammon's nonlinear mapping criterion. Off-diagonal dissimilarities must be strictly positive with this criterion. • 'strain' — A criterion equivalent to that used in classical multidimensional scaling. • Weights — A matrix or vector the same size as D, containing nonnegative dissimilarity weights. You can use these to weight the contribution of the corresponding elements of D in computing and minimizing stress. Elements of D corresponding to zero weights are effectively ignored. Note When you specify weights as a full matrix, its diagonal elements are ignored and have no effect, since the corresponding diagonal elements of D do not enter into the stress calculation. • Start — Method used to choose the initial configuration of points for Y. The choices are • 'cmdscale' — Use the classical multidimensional scaling solution. This is the default. 'cmdscale' is not valid when there are zero weights. • 'random' — Choose locations randomly from an appropriately scaled p-dimensional normal distribution with uncorrelated coordinates. • An n-by-p matrix of initial locations, where n is the size of the matrix D and p is the number of columns of the output matrix Y. In this case, you can pass in [] for p and mdscale infers p from the second dimension of the matrix. You can also supply a 3-D array, implying a value for 'Replicates' from the array's third dimension. • Replicates — Number of times to repeat the scaling, each with a new initial configuration. The default is 1. • Options — Options for the iterative algorithm used to minimize the fitting criterion. Pass in an options structure created by statset. For example, opts = statset(param1,val1,param2,val2, ...); [...] = mdscale(...,'Options',opts)

The choices of statset parameters are • 'Display' — Level of display output. The choices are 'off' (the default), 'iter', and 'final'. • 'MaxIter' — Maximum number of iterations allowed. The default is 200. • 'TolFun' — Termination tolerance for the stress criterion and its gradient. The default is 1e-4. • 'TolX'— Termination tolerance for the configuration location step size. The default is 1e-4.

Examples load cereal.mat X = [Calories Protein Fat Sodium Fiber ... Carbo Sugars Shelf Potass Vitamins]; % Take a subset from a single manufacturer. X = X(strcmp('K',cellstr(Mfg)),:); % Create a dissimilarity matrix. dissimilarities = pdist(X); % Use non-metric scaling to recreate the data in 2D, % and make a Shepard plot of the results.

32-3389

32

Functions

[Y,stress,disparities] = mdscale(dissimilarities,2); distances = pdist(Y); [dum,ord] = sortrows([disparities(:) dissimilarities(:)]); plot(dissimilarities,distances,'bo', ... dissimilarities(ord),disparities(ord),'r.-'); xlabel('Dissimilarities'); ylabel('Distances/Disparities') legend({'Distances' 'Disparities'},'Location','NW');

% Do metric scaling on the same dissimilarities. figure [Y,stress] = ... mdscale(dissimilarities,2,'criterion','metricsstress'); distances = pdist(Y); plot(dissimilarities,distances,'bo', ... [0 max(dissimilarities)],[0 max(dissimilarities)],'r.-'); xlabel('Dissimilarities'); ylabel('Distances')

32-3390

mdscale

See Also cmdscale | pdist | statset Introduced before R2006a

32-3391

32

Functions

mdsprox Class: CompactTreeBagger Multidimensional scaling of proximity matrix

Syntax [SC,EIGEN] = mdsprox(B,X) [SC,EIGEN] = mdsprox(B,X,'param1',val1,'param2',val2,...)

Description [SC,EIGEN] = mdsprox(B,X) applies classical multidimensional scaling to the proximity matrix computed for the data in the matrix X, and returns scaled coordinates SC and eigenvalues EIGEN of the scaling transformation. The method applies multidimensional scaling to the matrix of distances defined as 1-prox, where prox is the proximity matrix returned by the proximity method. You can supply the proximity matrix directly by using the 'Data' parameter. [SC,EIGEN] = mdsprox(B,X,'param1',val1,'param2',val2,...) specifies optional parameter name/value pairs: 'Data'

Flag indicating how the method treats the X input argument. If set to 'predictors' (default), mdsprox assumes X to be a matrix of predictors and used for computation of the proximity matrix. If set to 'proximity', the method treats X as a proximity matrix returned by the proximity method.

'Colors'

If you supply this argument, mdsprox makes overlaid scatter plots of two scaled coordinates using specified colors for different classes. You must supply the colors as a character vector or a string scalar with one letter for each color. If there are more classes in the data than letters in the supplied value, mdsprox plots only the first C classes, where C is the number of letters in the supplied value. For regression or if you do not provide the vector of true class labels, the method uses the first color for all observations in X.

'Labels'

Vector of true class labels for a classification ensemble. True class labels can be a numeric vector, character matrix, string array, or cell array of character vectors. If supplied, this vector must have as many elements as there are observations (rows) in X. This argument has no effect unless you also supply the 'Colors' argument.

'MDSCoordinate Indices of the two scaled coordinates to plot. By default, mdsprox makes a scatter s' plot of the first and second scaled coordinates which correspond to the two largest eigenvalues. You can specify any other two or three indices not exceeding the dimensionality of the scaled data. This argument has no effect unless you also supply the 'Colors' argument.

See Also cmdscale | mdsprox | proximity

32-3392

mdsprox

mdsprox Class: TreeBagger Multidimensional scaling of proximity matrix

Syntax [S,E] = mdsprox(B) [S,E] = mdsprox(B,'param1',val1,'param2',val2,...)

Description [S,E] = mdsprox(B) returns scaled coordinates, S, and eigenvalues, E, for the proximity matrix in the ensemble B. An earlier call to fillprox(B) must create the proximity matrix. [S,E] = mdsprox(B,'param1',val1,'param2',val2,...) specifies optional parameter name/ value pairs: 'Keep'

Array of indices of observations in the training data to use for multidimensional scaling. By default, this argument is set to 'all'. If you provide numeric or logical indices, the method uses only the subset of the training data specified by these indices to compute the scaled coordinates and eigenvalues.

'Colors'

If you supply this argument, mdsprox makes overlaid scatter plots of two scaled coordinates using specified colors for different classes. You must supply the colors as a character vector or a string scalar with one letter for each color. If there are more classes in the data than letters in the supplied value, mdsprox plots only the first C classes, where C is the number of letters in the supplied value. For regression, the method uses the first color for all observations in X.

'MDSCoordinates' Indices of the two scaled coordinates to plot. By default, mdsprox makes a scatter plot of the first and second scaled coordinates which correspond to the two largest eigenvalues. You can specify any other two or three indices not exceeding the dimensionality of the scaled data. This argument has no effect unless you also supply the 'Colors' argument.

See Also cmdscale | fillprox | mdsprox

32-3393

32

Functions

mean Package: prob Mean of probability distribution

Syntax m = mean(pd)

Description m = mean(pd) returns the mean m of the probability distribution pd.

Examples Mean of a Fitted Distribution Load the sample data. Create a vector containing the first column of students’ exam grade data. load examgrades x = grades(:,1);

Create a normal distribution object by fitting it to the data. pd = fitdist(x,'Normal') pd = NormalDistribution Normal distribution mu = 75.0083 [73.4321, 76.5846] sigma = 8.7202 [7.7391, 9.98843]

Compute the mean of the fitted distribution. m = mean(pd) m = 75.0083

The mean of the normal distribution is equal to the parameter mu.

Mean of a Skewed Distribution Create a Weibull probability distribution object. pd = makedist('Weibull','a',5,'b',2) pd = WeibullDistribution

32-3394

mean

Weibull distribution A = 5 B = 2

Compute the mean of the distribution. mean = mean(pd) mean = 4.4311

Mean of a Uniform Distribution Create a uniform distribution object pd = makedist('Uniform','lower',-3,'upper',5) pd = UniformDistribution Uniform distribution Lower = -3 Upper = 5

Compute the mean of the distribution. m = mean(pd) m = 1

Mean of a Kernel Distribution Load the sample data. Create a probability distribution object by fitting a kernel distribution to the miles per gallon (MPG) data. load carsmall; pd = fitdist(MPG,'Kernel') pd = KernelDistribution Kernel = normal Bandwidth = 4.11428 Support = unbounded

Compute the mean of the distribution. mean(pd) ans = 23.7181

32-3395

32

Functions

Input Arguments pd — Probability distribution probability distribution object Probability distribution, specified as a probability distribution object created using one of the following. Function or App

Description

makedist

Create a probability distribution object using specified parameter values.

fitdist

Fit a probability distribution object to sample data.

Distribution Fitter

Fit a probability distribution to sample data using the interactive Distribution Fitter app and export the fitted object to the workspace.

Output Arguments m — Mean scalar value Mean of the probability distribution, returned as a scalar value.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The input argument pd can be a fitted probability distribution object for beta, exponential, extreme value, lognormal, normal, and Weibull distributions. Create pd by fitting a probability distribution to sample data from the fitdist function. For an example, see “Code Generation for Probability Distribution Objects” on page 31-70. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5.

See Also Distribution Fitter | fitdist | makedist | median | std Topics “Working with Probability Distributions” on page 5-2 “Supported Distributions” on page 5-13 Introduced in R2013a

32-3396

meanMargin

meanMargin Class: CompactTreeBagger Mean classification margin

Syntax mar mar mar mar

= = = =

meanMargin(B,TBLnew,Ynew) meanMargin(B,Xnew,Ynew) meanMargin(B,TBLnew,Ynew,'param1',val1,'param2',val2,...) meanMargin(B,Xnew,Ynew,'param1',val1,'param2',val2,...)

Description mar = meanMargin(B,TBLnew,Ynew) computes average classification margins for the predictors contained in the table TBLnew given the true response Ynew. You can omit Ynew if TBLnew contains the response variable. If you trained B using sample data contained in a table, then the input data for this method must also be in a table. mar = meanMargin(B,Xnew,Ynew) computes average classification margins for the predictors contained in the matrix Xnew given true response Ynew. If you trained B using sample data contained in a matrix, then the input data for this method must also be in a matrix. Ynew can be a numeric vector, character matrix, string array, cell array of character vectors, categorical vector, or logical vector. meanMargin averages the margins over all observations (rows) in TBLnew or Xnew for each tree. mar is a matrix of size 1-by-NTrees, where NTrees is the number of trees in the ensemble B. This method is available for classification ensembles only. mar = meanMargin(B,TBLnew,Ynew,'param1',val1,'param2',val2,...) or mar = meanMargin(B,Xnew,Ynew,'param1',val1,'param2',val2,...) specifies optional parameter name-value pairs: 'Mode'

How meanMargin computes errors. If set to 'cumulative' (default), is a vector of length NTrees where the first element gives mean margin from trees(1), second column gives mean margins from trees(1:2) etc., up to trees(1:NTrees). If set to 'individual', mar is a vector of length NTrees, where each element is a mean margin from each tree in the ensemble . If set to 'ensemble', mar is a scalar showing the cumulative mean margin for the entire ensemble.

'Trees'

Vector of indices indicating what trees to include in this calculation. By default, this argument is set to 'all' and the method uses all trees. If 'Trees' is a numeric vector, the method returns a vector of length NTrees for 'cumulative' and 'individual' modes, where NTrees is the number of elements in the input vector, and a scalar for 'ensemble' mode. For example, in the 'cumulative' mode, the first element gives mean margin from trees(1), the second element gives mean margin from trees(1:2) etc.

32-3397

32

Functions

'TreeWeights'

See Also meanMargin

32-3398

Vector of tree weights. This vector must have the same length as the 'Trees' vector. meanMargin uses these weights to combine output from the specified trees by taking a weighted average instead of the simple nonweighted majority vote. You cannot use this argument in the 'individual' mode.

meanMargin

meanMargin Class: TreeBagger Mean classification margin

Syntax mar mar mar mar

= = = =

meanMargin(B,TBLnew,Ynew) meanMargin(B,Xnew,Ynew) meanMargin(B,TBLnew,Ynew,'param1',val1,'param2',val2,...) meanMargin(B,Xnew,Ynew,'param1',val1,'param2',val2,...)

Description mar = meanMargin(B,TBLnew,Ynew) computes average classification margins for the predictors contained in the table TBLnew given the true response Ynew. You can omit Ynew if TBLnew contains the response variable. If you trained B using sample data contained in a table, then the input data for this method must also be in a table. mar = meanMargin(B,Xnew,Ynew) computes average classification margins for the predictors contained in the matrix Xnew given true response Ynew. If you trained B using sample data contained in a matrix, then the input data for this method must also be in a matrix. Ynew can be a numeric vector, character matrix, string array, cell array of character vectors, categorical vector or logical vector. meanMargin averages the margins over all observations (rows) in TBLnew or Xnew for each tree. mar is a matrix of size 1-by-NTrees, where NTrees is the number of trees in the ensemble B. This method is available for classification ensembles only. mar = meanMargin(B,TBLnew,Ynew,'param1',val1,'param2',val2,...) or mar = meanMargin(B,Xnew,Ynew,'param1',val1,'param2',val2,...) specifies optional parameter name-value pairs: 'Mode'

Character vector or string scalar indicating how meanMargin computes errors. If set to 'cumulative' (default), is a vector of length NTrees where the first element gives mean margin from trees(1), second column gives mean margins from trees(1:2) etc, up to trees(1:NTrees). If set to 'individual', mar is a vector of length NTrees, where each element is a mean margin from each tree in the ensemble . If set to 'ensemble', mar is a scalar showing the cumulative mean margin for the entire ensemble .

'Trees'

Vector of indices indicating what trees to include in this calculation. By default, this argument is set to 'all' and the method uses all trees. If 'Trees' is a numeric vector, the method returns a vector of length NTrees for 'cumulative' and 'individual' modes, where NTrees is the number of elements in the input vector, and a scalar for 'ensemble' mode. For example, in the 'cumulative' mode, the first element gives mean margin from trees(1), the second element gives mean margin from trees(1:2) etc.

32-3399

32

Functions

'TreeWeights'

Vector of tree weights. This vector must have the same length as the 'Trees' vector. meanMargin uses these weights to combine output from the specified trees by taking a weighted average instead of the simple nonweighted majority vote. You cannot use this argument in the 'individual' mode.

'Weights'

Vector of observation weights to use for margin averaging. By default the weight of every observation is set to 1. The length of this vector must be equal to the number of rows in X.

'UseInstanceForT Logical matrix of size Nobs-by-NumTrees indicating which trees to use to ree' make predictions for each observation. By default the method uses all trees for all observations.

See Also meanMargin

32-3400

surrogateAssociation

surrogateAssociation Mean predictive measure of association for surrogate splits in classification tree

Syntax ma = surrogateAssociation(tree) ma = surrogateAssociation(tree,N)

Description ma = surrogateAssociation(tree) returns a matrix of predictive measures of association for the predictors in tree. ma = surrogateAssociation(tree,N) returns a matrix of predictive measures of association averaged over the nodes in vector N.

Input Arguments tree A classification tree constructed with fitctree, or a compact regression tree constructed with compact. N Vector of node numbers in tree.

Output Arguments ma • ma = surrogateAssociation(tree) returns a P-by-P matrix, where P is the number of predictors in tree. ma(i,j) is the predictive measure of association on page 32-3402 between the optimal split on variable i and a surrogate split on variable j. For more details, see “Algorithms” on page 32-3403. • ma = surrogateAssociation(tree,N) returns a P-by-P representing the predictive measure of association between variables averaged over nodes in the vector N. N contains node numbers from 1 to max(tree.NumNodes).

Examples Estimate Predictive Measures of Association for Surrogate Splits Load Fisher's iris data set. load fisheriris

32-3401

32

Functions

Grow a classification tree using species as the response. Specify to use surrogate splits for missing values. tree = fitctree(meas,species,'surrogate','on');

Find the mean predictive measure of association between the predictor variables. ma = surrogateAssociation(tree) ma = 4×4 1.0000 0 0.4633 0.2065

0 1.0000 0.2500 0.1413

0 0 1.0000 0.4022

0 0 0.5000 1.0000

Find the mean predictive measure of association averaged over the odd-numbered nodes in tree. N = 1:2:tree.NumNodes; ma = surrogateAssociation(tree,N) ma = 4×4 1.0000 0 0.7600 0.4130

0 1.0000 0.5000 0.2826

0 0 1.0000 0.8043

0 0 1.0000 1.0000

More About Predictive Measure of Association The predictive measure of association is a value that indicates the similarity between decision rules that split observations. Among all possible decision splits that are compared to the optimal split (found by growing the tree), the best surrogate decision split on page 32-1791 yields the maximum predictive measure of association. The second-best surrogate split has the second-largest predictive measure of association. Suppose xj and xk are predictor variables j and k, respectively, and j ≠ k. At node t, the predictive measure of association between the optimal split xj < u and a surrogate split xk < v is λ jk =

min PL, PR − 1 − PL jLk − PR jRk min PL, PR

.

• PL is the proportion of observations in node t, such that xj < u. The subscript L stands for the left child of node t. • PR is the proportion of observations in node t, such that xj ≥ u. The subscript R stands for the right child of node t. • PL jLk is the proportion of observations at node t, such that xj < u and xk < v. • PR jRk is the proportion of observations at node t, such that xj ≥ u and xk ≥ v. 32-3402

surrogateAssociation

• Observations with missing values for xj or xk do not contribute to the proportion calculations. λjk is a value in (–∞,1]. If λjk > 0, then xk < v is a worthwhile surrogate split for xj < u. Surrogate Decision Splits A surrogate decision split is an alternative to the optimal decision split at a given node in a decision tree. The optimal split is found by growing the tree; the surrogate split uses a similar or correlated predictor variable and split criterion. When the value of the optimal split predictor for an observation is missing, the observation is sent to the left or right child node using the best surrogate predictor. When the value of the best surrogate split predictor for the observation is also missing, the observation is sent to the left or right child node using the second-best surrogate predictor, and so on. Candidate splits are sorted in descending order by their predictive measure of association on page 32-2179.

Algorithms Element ma(i,j) is the predictive measure of association averaged over surrogate splits on predictor j for which predictor i is the optimal split predictor. This average is computed by summing positive values of the predictive measure of association over optimal splits on predictor i and surrogate splits on predictor j and dividing by the total number of optimal splits on predictor i, including splits for which the predictive measure of association between predictors i and j is negative.

See Also ClassificationTree | fitctree

32-3403

32

Functions

surrogateAssociation Mean predictive measure of association for surrogate splits in regression tree

Syntax ma = surrogateAssociation(tree) ma = surrogateAssociation(tree,N)

Description ma = surrogateAssociation(tree) returns a matrix of predictive measures of association for the predictors in tree. ma = surrogateAssociation(tree,N) returns a matrix of predictive measures of association averaged over the nodes in vector N.

Input Arguments tree A regression tree constructed with fitrtree, or a compact regression tree constructed with compact. N Vector of node numbers in tree.

Output Arguments ma • ma = surrogateAssociation(tree) returns a P-by-P matrix, where P is the number of predictors in tree. ma(i,j) is the predictive measure of association on page 32-3405 between the optimal split on variable i and a surrogate split on variable j. For more details, see “Algorithms” on page 32-3406. • ma = surrogateAssociation(tree,N) returns a P-by-P representing the predictive measure of association between variables averaged over nodes in the vector N. N contains node numbers from 1 to max(tree.NumNodes).

Examples Estimate Predictive Measures of Association for Surrogate Splits Load the carsmall data set. Specify Displacement, Horsepower, and Weight as predictor variables. load carsmall X = [Displacement Horsepower Weight];

32-3404

surrogateAssociation

Grow a regression tree using MPG as the response. Specify to use surrogate splits for missing values. tree = fitrtree(X,MPG,'surrogate','on');

Find the mean predictive measure of association between the predictor variables. ma = surrogateAssociation(tree) ma = 3×3 1.0000 0.4521 0.2540

0.2167 1.0000 0.2659

0.5083 0.3769 1.0000

Find the mean predictive measure of association averaged over the odd-numbered nodes in tree. N = 1:2:tree.NumNodes; ma = surrogateAssociation(tree,N) ma = 3×3 1.0000 0.5632 0.3333

0.1250 1.0000 0.3148

0.6875 0.5861 1.0000

More About Predictive Measure of Association The predictive measure of association is a value that indicates the similarity between decision rules that split observations. Among all possible decision splits that are compared to the optimal split (found by growing the tree), the best surrogate decision split on page 32-1791 yields the maximum predictive measure of association. The second-best surrogate split has the second-largest predictive measure of association. Suppose xj and xk are predictor variables j and k, respectively, and j ≠ k. At node t, the predictive measure of association between the optimal split xj < u and a surrogate split xk < v is λ jk =

min PL, PR − 1 − PL jLk − PR jRk min PL, PR

.

• PL is the proportion of observations in node t, such that xj < u. The subscript L stands for the left child of node t. • PR is the proportion of observations in node t, such that xj ≥ u. The subscript R stands for the right child of node t. • PL jLk is the proportion of observations at node t, such that xj < u and xk < v. • PR jRk is the proportion of observations at node t, such that xj ≥ u and xk ≥ v. • Observations with missing values for xj or xk do not contribute to the proportion calculations. λjk is a value in (–∞,1]. If λjk > 0, then xk < v is a worthwhile surrogate split for xj < u. 32-3405

32

Functions

Surrogate Decision Splits A surrogate decision split is an alternative to the optimal decision split at a given node in a decision tree. The optimal split is found by growing the tree; the surrogate split uses a similar or correlated predictor variable and split criterion. When the value of the optimal split predictor for an observation is missing, the observation is sent to the left or right child node using the best surrogate predictor. When the value of the best surrogate split predictor for the observation is also missing, the observation is sent to the left or right child node using the second-best surrogate predictor, and so on. Candidate splits are sorted in descending order by their predictive measure of association on page 32-2179.

Algorithms Element ma(i,j) is the predictive measure of association averaged over surrogate splits on predictor j for which predictor i is the optimal split predictor. This average is computed by summing positive values of the predictive measure of association over optimal splits on predictor i and surrogate splits on predictor j and dividing by the total number of optimal splits on predictor i, including splits for which the predictive measure of association between predictors i and j is negative.

See Also RegressionTree | fitrtree | prune

32-3406

median

median Package: prob Median of probability distribution

Syntax m = median(pd)

Description m = median(pd) returns the median m for the probability distribution pd

Examples Median of a Fitted Distribution Load the sample data. Create a vector containing the first column of students' exam grade data. load examgrades x = grades(:,1);

Create a normal distribution object by fitting it to the data. pd = fitdist(x,'Normal') pd = NormalDistribution Normal distribution mu = 75.0083 [73.4321, 76.5846] sigma = 8.7202 [7.7391, 9.98843]

Compute the median of the fitted distribution. m = median(pd) m = 75.0083

For a symmetrical distribution such as the normal distribution, the median is equal to the mean, mu.

Median of a Skewed Distribution Create a Weibull probability distribution object. pd = makedist('Weibull','a',5,'b',2) pd = WeibullDistribution

32-3407

32

Functions

Weibull distribution A = 5 B = 2

Compute the median of the distribution. m = median(pd) m = 4.1628

For a skewed distribution such as the Weibull distribution, the median and the mean may not be equal. Calculate the mean of the Weibull distribution and compare it to the median. mean = mean(pd) mean = 4.4311

The mean of the distribution is greater than the median. Plot the pdf to visualize the distribution. x = [0:.1:15]; pdf = pdf(pd,x); plot(x,pdf)

32-3408

median

Input Arguments pd — Probability distribution probability distribution object Probability distribution, specified as a probability distribution object created using one of the following. Function or App

Description

makedist

Create a probability distribution object using specified parameter values.

fitdist

Fit a probability distribution object to sample data.

Distribution Fitter

Fit a probability distribution to sample data using the interactive Distribution Fitter app and export the fitted object to the workspace.

Output Arguments m — Median scalar value Median of the probability distribution, returned as a scalar value. The value of m is the 50th percentile of the probability distribution.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The input argument pd can be a fitted probability distribution object for beta, exponential, extreme value, lognormal, normal, and Weibull distributions. Create pd by fitting a probability distribution to sample data from the fitdist function. For an example, see “Code Generation for Probability Distribution Objects” on page 31-70. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5.

See Also Distribution Fitter | fitdist | makedist | mean Topics “Working with Probability Distributions” on page 5-2 “Supported Distributions” on page 5-13 Introduced in R2013a

32-3409

32

Functions

mergelevels (Not Recommended) Merge levels of nominal or ordinal arrays Note The nominal and ordinal array data types are not recommended. To represent ordered and unordered discrete, nonnumeric data, use the “Categorical Arrays” (MATLAB) data type instead.

Syntax B = mergelevels(A,oldlevels) B = mergelevels(A,oldlevels,newlevel)

Description B = mergelevels(A,oldlevels) merges two or more levels of A. • If A is a nominal array, mergelevels uses the first label in oldlevels as the new level. • If A is an ordinal array, the levels specified by oldlevels must be consecutive, and mergelevels uses the label corresponding to the lowest level in oldlevels as the label for the new level. B = mergelevels(A,oldlevels,newlevel) merges two or more levels into the new level with label newlevel.

Examples Create New Category From Merged Levels Create a nominal array from data in a cell array. colors = nominal({'r','b','g';'g','r','b';'b','r','g'},... {'blue','green','red'}) colors = 3x3 nominal red blue green red blue red

green blue green

Merge the elements of the 'red' and 'blue' levels into a new level labeled 'purple'. colors = mergelevels(colors,{'red','blue'},'purple') colors = 3x3 nominal purple purple green purple purple purple

Display the levels of colors. 32-3410

green purple green

mergelevels

getlevels(colors) ans = 1x2 nominal purple green

Input Arguments A — Nominal or ordinal array nominal array | ordinal array Nominal or ordinal array, specified as a nominal or ordinal array object created with nominal or ordinal. oldlevels — Levels to merge string array | cell array of character vectors | 2-D character array Levels to merge, specified as a string array, cell array of character vectors, or 2-D character array. For ordinal arrays, the levels in oldlevels must be consecutive. Data Types: char | string | cell newlevel — Level to create character vector | string scalar Level to create from the merged levels, specified as a character vector or string scalar that gives the label for the new level. Data Types: char | string

Output Arguments B — Nominal or ordinal array nominal array | ordinal array Nominal or ordinal array, returned as a nominal or ordinal array object.

See Also addlevels | droplevels | nominal | ordinal | reorderlevels Topics “Merge Category Levels” on page 2-16 Introduced in R2007a

32-3411

32

Functions

mhsample Metropolis-Hastings sample

Syntax smpl = mhsample(start,nsamples,'pdf',pdf,'proppdf',proppdf, 'proprnd',proprnd) smpl = mhsample(...,'symmetric',sym) smpl = mhsample(...,'burnin',K) smpl = mhsample(...,'thin',m) smpl = mhsample(...,'nchain',n) [smpl,accept] = mhsample(...)

Description smpl = mhsample(start,nsamples,'pdf',pdf,'proppdf',proppdf, 'proprnd',proprnd) draws nsamples random samples from a target stationary distribution pdf using the Metropolis-Hastings algorithm. start is a row vector containing the start value of the Markov Chain, nsamples is an integer specifying the number of samples to be generated, and pdf, proppdf, and proprnd are function handles created using @. proppdf defines the proposal distribution density, and proprnd defines the random number generator for the proposal distribution. pdf and proprnd take one argument as an input with the same type and size as start. proppdf takes two arguments as inputs with the same type and size as start. smpl is a column vector or matrix containing the samples. If the log density function is preferred, 'pdf' and 'proppdf' can be replaced with 'logpdf' and 'logproppdf'. The density functions used in Metropolis-Hastings algorithm are not necessarily normalized. The proposal distribution q(x,y) gives the probability density for choosing x as the next point when y is the current point. It is sometimes written as q(x|y). If the proppdf or logproppdf satisfies q(x,y) = q(y,x), that is, the proposal distribution is symmetric, mhsample implements Random Walk Metropolis-Hastings sampling. If the proppdf or logproppdf satisfies q(x,y) = q(x), that is, the proposal distribution is independent of current values, mhsample implements Independent Metropolis-Hastings sampling. smpl = mhsample(...,'symmetric',sym) draws nsamples random samples from a target stationary distribution pdf using the Metropolis-Hastings algorithm. sym is a logical value that indicates whether the proposal distribution is symmetric. The default value is false, which corresponds to the asymmetric proposal distribution. If sym is true, for example, the proposal distribution is symmetric, proppdf and logproppdf are optional. smpl = mhsample(...,'burnin',K) generates a Markov chain with values between the starting point and the kth point omitted in the generated sequence. Values beyond the kth point are kept. k is a nonnegative integer with default value of 0. smpl = mhsample(...,'thin',m) generates a Markov chain with m-1 out of m values omitted in the generated sequence. m is a positive integer with default value of 1. 32-3412

mhsample

smpl = mhsample(...,'nchain',n) generates n Markov chains using the Metropolis-Hastings algorithm. n is a positive integer with a default value of 1. smpl is a matrix containing the samples. The last dimension contains the indices for individual chains. [smpl,accept] = mhsample(...) also returns accept, the acceptance rate of the proposed distribution. accept is a scalar if a single chain is generated and is a vector if multiple chains are generated.

Examples Estimate Moments Using Independent Metropolis-Hastings Sampling Use Independent Metropolis-Hastings sampling to estimate the second order moment of a Gamma distribution. rng default; % For reproducibility alpha = 2.43; beta = 1; pdf = @(x)gampdf(x,alpha,beta); % Target distribution proppdf = @(x,y)gampdf(x,floor(alpha),floor(alpha)/alpha); proprnd = @(x)sum(... exprnd(floor(alpha)/alpha,floor(alpha),1)); nsamples = 5000; smpl = mhsample(1,nsamples,'pdf',pdf,'proprnd',proprnd,... 'proppdf',proppdf);

Plot the results. xxhat = cumsum(smpl.^2)./(1:nsamples)'; figure; plot(1:nsamples,xxhat)

32-3413

32

Functions

Random Walk Metropolis-Hastings Sampling Use Random Walk Metropolis-Hastings sampling to generate sample data from a standard normal distribution. rng default % For reproducibility delta = .5; pdf = @(x) normpdf(x); proppdf = @(x,y) unifpdf(y-x,-delta,delta); proprnd = @(x) x + rand*2*delta - delta; nsamples = 15000; x = mhsample(1,nsamples,'pdf',pdf,'proprnd',proprnd,'symmetric',1);

Plot the sample data. figure; h = histfit(x,50); h(1).FaceColor = [.8 .8 1];

32-3414

mhsample

See Also rand | slicesample Topics “Using the Metropolis-Hastings Algorithm” on page 7-10 “Representing Sampling Distributions Using Markov Chain Samplers” on page 7-10 Introduced in R2006a

32-3415

32

Functions

mle Maximum likelihood estimates

Syntax phat = mle(data) phat = mle(data,'distribution',dist) phat = mle(data,'pdf',pdf,'start',start) phat = mle(data,'pdf',pdf,'start',start,'cdf',cdf) phat = mle(data,'logpdf',logpdf,'start',start) phat = mle(data,'logpdf',logpdf,'start',start,'logsf',logsf) phat = mle(data,'nloglf',nloglf,'start',start) phat = mle( ___ ,Name,Value) [phat,pci] = mle( ___ )

Description phat = mle(data) returns maximum likelihood estimates (MLEs) for the parameters of a normal distribution, using the sample data in the vector data. phat = mle(data,'distribution',dist) returns parameter estimates for a distribution specified by dist. phat = mle(data,'pdf',pdf,'start',start) returns parameter estimates for a custom distribution specified by the probability density function pdf. You must also specify the initial parameter values, start. phat = mle(data,'pdf',pdf,'start',start,'cdf',cdf) returns parameter estimates for a custom distribution specified by the probability density function pdf and custom cumulative distribution function cdf. phat = mle(data,'logpdf',logpdf,'start',start) returns parameter estimates for a custom distribution specified by the log probability density function logpdf. You must also specify the initial parameter values, start. phat = mle(data,'logpdf',logpdf,'start',start,'logsf',logsf) returns parameter estimates for a custom distribution specified by the log probability density function logpdf and custom log survival function on page 32-3429 logsf. phat = mle(data,'nloglf',nloglf,'start',start) returns parameter estimates for the custom distribution specified by the negative loglikelihood function nloglf. You must also specify the initial parameter values, start. phat = mle( ___ ,Name,Value) specifies options using name-value pair arguments in addition to any of the input arguments in previous syntaxes. For example, you can specify the censored data, frequency of observations, and confidence level. 32-3416

mle

[phat,pci] = mle( ___ ) also returns the 95% confidence intervals for the parameters.

Examples Estimate Parameters of Burr Distribution Load the sample data. load carbig

The variable MPG has the miles per gallon for different models of cars. Draw a histogram of MPG data. histogram(MPG)

The distribution is somewhat right skewed. A symmetric distribution, such as normal distribution, might not be a good fit. Estimate the parameters of the Burr Type XII distribution for the MPG data. phat = mle(MPG,'distribution','burr') phat = 1×3

32-3417

32

Functions

34.6447

3.7898

3.5722

The maximum likelihood estimates for the scale parameter α is 34.6447. The estimates for the two shape parameters c and k of the Burr Type XII distribution are 3.7898 and 3.5722, respectively.

Estimate Parameters of a Noncentral Chi-Square Distribution Generate sample data of size 1000 from a noncentral chi-square distribution with degrees of freedom 8 and noncentrality parameter 3. rng default % for reproducibility x = ncx2rnd(8,3,1000,1);

Estimate the parameters of the noncentral chi-square distribution from the sample data. To do this, custom define the noncentral chi-square pdf using the pdf input argument. [phat,pci] = mle(x,'pdf',@(x,v,d)ncx2pdf(x,v,d),'start',[1,1]) phat = 1×2 8.1052

2.6693

pci = 2×2 7.1121 9.0983

1.6025 3.7362

The estimate for the degrees of freedom is 8.1052 and the noncentrality parameter is 2.6693. The 95% confidence interval for the degrees of freedom is (7.1121,9.0983) and the noncentrality parameter is (1.6025,3.7362). The confidence intervals include the true parameter values of 8 and 3, respectively.

Fit Custom Distribution to Censored Data Load the sample data. load('readmissiontimes.mat');

The data includes ReadmissionTime, which has readmission times for 100 patients. The column vector Censored has the censorship information for each patient, where 1 indicates a censored observation, and 0 indicates the exact readmission time is observed. This is simulated data. Define a custom probability density and cumulative distribution function. custpdf = @(data,lambda) lambda*exp(-lambda*data); custcdf = @(data,lambda) 1-exp(-lambda*data);

Estimate the parameter, lambda, of the custom distribution for the censored sample data. phat = mle(ReadmissionTime,'pdf',custpdf,'cdf',custcdf,'start',0.05,'Censoring',Censored)

32-3418

mle

phat = 0.1096

Fit Custom Log pdf and Survival Function Load the sample data. load('readmissiontimes.mat');

The data includes ReadmissionTime, which has readmission times for 100 patients. The column vector Censored has the censorship information for each patient, where 1 indicates a censored observation, and 0 indicates the exact readmission time is observed. This is simulated data. Define a custom log probability density and survival function. custlogpdf = @(data,lambda,k) log(k)-k*log(lambda)+(k-1)*log(data)-(data/lambda).^k; custlogsf = @(data,lambda,k) -(data/lambda).^k;

Estimate the parameters, lambda and k, of the custom distribution for the censored sample data. phat = mle(ReadmissionTime,'logpdf',custlogpdf,'logsf',custlogsf,... 'start',[1,0.75],'Censoring',Censored) phat = 1×2 9.2090

1.4223

The scale and shape parameters of the custom-defined distribution are 9.2090 and 1.4223, respectively.

Fit Custom Log Negative Likelihood Function Load the sample data. load('readmissiontimes.mat');

The data includes ReadmissionTime, which has readmission times for 100 patients. This is simulated data. Define a negative log likelihood function. custnloglf = @(lambda,data,cens,freq) - length(data)*log(lambda) + nansum(lambda*data);

Estimate the parameters of the defined distribution. phat = mle(ReadmissionTime,'nloglf',custnloglf,'start',0.05) phat = 0.1462

32-3419

32

Functions

Estimate Probability of Success Generate 100 random observations from a binomial distribution with the number of trials, n = 20, and the probability of success, p = 0.75. data = binornd(20,0.75,100,1);

Estimate the probability of success and 95% confidence limits using the simulated sample data. [phat,pci] = mle(data,'distribution','binomial','alpha',.05,'ntrials',20) phat = 0.7615 pci = 2×1 0.7422 0.7800

The estimate of probability of success is 0.7615 and the lower and upper limits of the 95% confidence interval are 0.7422 and 0.78. This interval covers the true value used to simulate the data.

Fit Distribution with Known Parameter Generate sample data of size 1000 from a noncentral chi-square distribution with degrees of freedom 10 and noncentrality parameter 5. rng default % for reproducibility x = ncx2rnd(10,5,1000,1);

Suppose the noncentrality parameter is fixed at the value 5. Estimate the degrees of freedom of the noncentral chi-square distribution from the sample data. To do this, custom define the noncentral chisquare pdf using the pdf input argument. [phat,pci] = mle(x,'pdf',@(x,v,d)ncx2pdf(x,v,5),'start',1) phat = 9.9307 pci = 2×1 9.5626 10.2989

The estimate for the noncentrality parameter is 9.9307, with a 95% confidence interval of 9.5626 and 10.2989. The confidence interval includes the true parameter value of 10.

Fit Rician Distribution with Known Scale Parameter Generate sample data of size 1000 from a Rician distribution with noncentrality parameter of 8 and scale parameter of 5. First create the Rician distribution. r = makedist('Rician','s',8,'sigma',5);

32-3420

mle

Now, generate sample data from the distribution you created above. rng default % For reproducibility x = random(r,1000,1);

Suppose the scale parameter is known, and estimate the noncentrality parameter from sample data. To do this using mle, you must custom define the Rician probability density function. [phat,pci] = mle(x,'pdf',@(x,s,sigma) pdf('rician',x,s,5),'start',10) phat = 7.8953 pci = 2×1 7.5405 8.2501

The estimate for the noncentrality parameter is 7.8953, with a 95% confidence interval of 7.5404 and 8.2501. The confidence interval includes the true parameter value of 8.

Fit a Distribution with Additional Parameter Add a scale parameter to the chi-square distribution for adapting to the scale of data and fit it. First, generate sample data of size 1000 from a chi-square distribution with degrees of freedom 5, and scale it by the factor of 100. rng default % For reproducibility x = 100*chi2rnd(5,1000,1);

Estimate the degrees of freedom and the scaling factor. To do this, custom define the chi-square probability density function using the pdf input argument. The density function requires a 1/s factor for data scaled by s. [phat,pci] = mle(x,'pdf',@(x,v,s)chi2pdf(x/s,v)/s,'start',[1,200]) phat = 1×2 5.1079

99.1681

pci = 2×2 4.6862 5.5297

90.1215 108.2146

The estimate for the degrees of freedom is 5.1079 and the scale is 99.1681. The 95% confidence interval for the degrees of freedom is (4.6862,5.5279) and the scale parameter is (90.1215,108.2146). The confidence intervals include the true parameter values of 5 and 100, respectively.

32-3421

32

Functions

Input Arguments data — Sample data vector Sample data mle uses to estimate the distribution parameters, specified as a vector. Data Types: single | double dist — Distribution type 'normal' (default) | character vector or string scalar of distribution type Distribution type to estimate parameters for, specified as one of the following. dist

Description

Parameter 1

Parameter 2

Paramete Paramete r3 r4

'Bernoulli'

“Bernoulli Distribution” on page B-2

p: probability of success for each trial







'Beta'

“Beta Distribution” on page a: first shape B-6 parameter

b: second shape — parameter



'bino' or 'Binomial'

“Binomial Distribution” on page B-10

n: number of trials

p: probability of — success for each trial



β: scale parameter

γ: shape parameter





'BirnbaumSaunde “Birnbaum-Saunders rs' Distribution” on page B18 'Burr'

“Burr Type XII Distribution” α: scale on page B-19 parameter

c: first shape parameter

k: second — shape parameter

'Discrete Uniform' or 'unid'

“Uniform Distribution (Discrete)” on page B-168

n: maximum observable value







'exp' or 'Exponential'

“Exponential Distribution” on page B-33

μ: mean







'ev' or 'Extreme Value'

“Extreme Value Distribution” on page B40

μ: location parameter

σ: scale parameter





'gam' or 'Gamma'

“Gamma Distribution” on page B-47

a: shape parameter

b: scale parameter





'gev' or 'Generalized Extreme Value'

“Generalized Extreme Value k: shape Distribution” on page Bparameter 55

σ: scale parameter

μ: location — parameter

'gp' or 'Generalized Pareto'

“Generalized Pareto Distribution” on page B59

σ: scale parameter

θ: — threshold (location) parameter

32-3422

k: tail index (shape) parameter

mle

dist

Description

'geo' or 'Geometric' 'hn' or 'Half Normal'

Parameter 1

Parameter 2

Paramete Paramete r3 r4

“Geometric Distribution” on p: probability page B-63 parameter







“Half-Normal Distribution” on page B-68

μ: location parameter

σ: scale parameter





'InverseGaussia “Inverse Gaussian n' Distribution” on page B75

μ: scale parameter

λ: shape parameter





'Logistic'

“Logistic Distribution” on page B-85

μ: mean

σ: scale parameter





'LogLogistic'

“Loglogistic Distribution” on page B-86

μ: mean of logarithmic values

σ: scale parameter of logarithmic values





'logn' or 'LogNormal'

“Lognormal Distribution” on page B-88

μ: mean of logarithmic values

σ: standard deviation of logarithmic values





'Nakagami'

“Nakagami Distribution” on μ: shape page B-108 parameter

ω: scale parameter





'nbin' or 'Negative Binomial'

“Negative Binomial Distribution” on page B109

r: number of successes

p: probability of — success in a single trial



'norm' or 'Normal'

“Normal Distribution” on page B-119

μ: mean

σ: standard deviation





'poiss' or 'Poisson'

“Poisson Distribution” on page B-131

λ: mean







'rayl' or 'Rayleigh'

“Rayleigh Distribution” on page B-137

b: scale parameter







'Rician'

“Rician Distribution” on page B-139

s: noncentrality parameter

σ: scale parameter





'Stable'

“Stable Distribution” on page B-140

α: first shape parameter

β: second shape γ: scale δ: location parameter parameter parameter

'tLocationScale “t Location-Scale ' Distribution” on page B156

μ: location parameter

σ: scale parameter

'unif' or 'Uniform'

“Uniform Distribution (Continuous)” on page B163

a: lower endpoint b: upper (minimum) endpoint (maximum)





'wbl' or 'Weibull'

“Weibull Distribution” on page B-170

a: scale parameter





b: shape parameter

ν: shape — parameter

Example: 'rician'

32-3423

32

Functions

pdf — Custom probability density function function handle Custom probability distribution function, specified as a function handle created using @. This custom function accepts the vector data and one or more individual distribution parameters as input parameters, and returns a vector of probability density values. For example, if the name of the custom probability density function is newpdf, then you can specify the function handle in mle as follows. Example: @newpdf Data Types: function_handle cdf — Custom cumulative distribution function function handle Custom cumulative distribution function, specified as a function handle created using @. This custom function accepts the vector data and one or more individual distribution parameters as input parameters, and returns a vector of cumulative probability values. You must define cdf with pdf if data is censored and you use the 'Censoring' name-value pair argument. If 'Censoring' is not present, you do not have to specify cdf while using pdf. For example, if the name of the custom cumulative distribution function is newcdf, then you can specify the function handle in mle as follows. Example: @newcdf Data Types: function_handle logpdf — Custom log probability density function function handle Custom log probability density function, specified as a function handle created using @. This custom function accepts the vector data and one or more individual distribution parameters as input parameters, and returns a vector of log probability values. For example, if the name of the custom log probability density function is customlogpdf, then you can specify the function handle in mle as follows. Example: @customlogpdf Data Types: function_handle logsf — Custom log survival function function handle Custom log survival function on page 32-3429, specified as a function handle created using @. This custom function accepts the vector data and one or more individual distribution parameters as input parameters, and returns a vector of log survival probability values. You must define logsf with logpdf if data is censored and you use the 'Censoring' name-value pair argument. If 'Censoring' is not present, you do not have to specify logsf while using logpdf. 32-3424

mle

For example, if the name of the custom log survival function is logsurvival, then you can specify the function handle in mle as follows. Example: @logsurvival Data Types: function_handle nloglf — Custom negative loglikelihood function function handle Custom negative loglikelihood function, specified as a function handle created using @. This custom function accepts the following input arguments. params

Vector of distribution parameter values. mle detects the number of parameters from the number of elements in start.

data

Vector of data.

cens

Boolean vector of censored values.

freq

Vector of integer data frequencies.

nloglf must accept all four arguments even if you do not use the 'Censoring' or 'Frequency' name-value pair arguments. You can write 'nloglf' to ignore cens and freq arguments in that case. nloglf returns a scalar negative loglikelihood value and optionally, a negative loglikelihood gradient vector (see the 'GradObj' field in 'Options'). If the name of the custom negative log likelihood function is negloglik, then you can specify the function handle in mle as follows. Example: @negloglik Data Types: function_handle start — Initial parameter values scalar | vector Initial parameter values for the custom functions, specified as a scalar value or a vector of scalar values. Use start when you fit custom distributions, that is, when you use pdf and cdf, logpdf and logsf, or nloglf input arguments. Example: 0.05 Example: [100,2] Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Censoring',Cens,'Alpha',0.01,'Options',Opt specifies that mle estimates the parameters for the distribution of censored data specified by array Cens, computes the 99% 32-3425

32

Functions

confidence limits for the parameter estimates, and uses the algorithm control parameters specified by the structure Opt. Censoring — Indicator for censoring array of 0s (default) | array of 0s and 1s Indicator for censoring, specified as the comma-separated pair consisting of 'Censoring' and a Boolean array of the same size as data. Use 1 for observations that are right censored and 0 for observations that are fully observed. The default is all observations are fully observed. For example, if the censored data information is in the binary array called Censored, then you can specify the censored data as follows. Example: 'Censoring',Censored mle supports censoring for the following distributions: Birnbaum-Saunders Burr Exponential Extreme Value Gamma Inverse Gaussian Kernel Log-Logistic

Logistic Lognormal Nakagami Normal Rician t Location-Scale Weibull

Data Types: logical Frequency — Frequency of observations array of 1s (default) | vector of nonnegative integer counts Frequency of observations, specified as the comma-separated pair consisting of 'Frequency' and an array containing nonnegative integer counts, which is the same size as data. The default is one observation per element of data. For example, if the observation frequencies are stored in an array named Freq, you can specify the frequencies as follows. Example: 'Frequency',Freq Data Types: single | double Alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level for the confidence interval of parameter estimates, pci, specified as the commaseparated pair consisting of 'Alpha' and a scalar value in the range (0,1). The confidence level of pci is 100(1-Alpha)% . The default is 0.05 for 95% confidence. For example, for 99% confidence limits, you can specify the confidence level as follows. Example: 'Alpha',0.01 Data Types: single | double NTrials — Number of trials scalar value | vector 32-3426

mle

Number of trials for the corresponding element of data, specified as the comma-separated pair consisting of 'Ntrials' and a scalar or a vector of the same size as data. Applies only to binomial distribution. Example: 'Ntrials',total Data Types: single | double mu — Location parameter 0 (default) | scalar value Location parameter for the half-normal distribution, specified as the comma-separated pair consisting of 'mu' and a scalar value. Applies only to half-normal distribution. Example: 'mu',1 Data Types: single | double Options — Fitting algorithm control parameters structure Fitting algorithm control parameters, specified as the comma-separated pair consisting of 'Options' and a structure returned by statset. Not applicable to all distributions. Use the 'Options' name-value pair argument to control details of the maximum likelihood optimization when fitting a custom distribution. For parameter names and default values, type statset('mlecustom'). You can set the options under a new name and use that in the name-value pair argument. mle interprets the following statset parameters for custom distribution fitting. Parameter

Value

'GradObj'

Default is 'off'. 'on' or 'off', indicating whether or not fmincon can expect the custom function provided with the nloglf input argument to return the gradient vector of the negative log-likelihood as a second output. mle ignores 'GradObj' when using fminsearch.

'DerivStep'

Default is eps^(1/3). The relative difference, specified as a scalar or a vector the same size as start, used in finite difference derivative approximations when using fmincon, and 'GradObj' is 'off'. mle ignores 'DerivStep' when using fminsearch.

32-3427

32

Functions

Parameter

Value

'FunValCheck'

Default is 'on'. 'on' or 'off', indicating whether or not mle should check the values returned by the custom distribution functions for validity. A poor choice of starting point can sometimes cause these functions to return NaNs, infinite values, or out-of-range values if they are written without suitable error checking.

'TolBnd'

Default is 1e-6. An offset for lower and upper bounds when using fmincon. mle treats lower and upper bounds as strict inequalities, that is, open bounds. With fmincon, this is approximated by creating closed bounds inset from the specified lower and upper bounds by TolBnd.

Example: 'Options',statset('mlecustom') Data Types: struct LowerBound — Lower bounds for distribution parameters – ∞ (default) | vector Lower bounds for distribution parameters, specified as the comma-separated pair consisting of 'Lowerbound' and a vector the same size as start. This name-value pair argument is valid only when you use the pdf and cdf, logpdf and logcdf, or nloglf input arguments. Example: 'Lowerbound',0 Data Types: single | double UpperBound — Upper bounds for distribution parameters ∞ (default) | vector Upper bounds for distribution parameters, specified as the comma-separated pair consisting of 'Upperbound' and a vector the same size as start. This name-value pair argument is valid only when you use the pdf and cdf, logpdf and logsf, or nloglf input arguments. Example: 'Upperbound',1 Data Types: single | double OptimFun — Optimization function 'fminsearch' (default) | 'fmincon' Optimization function mle uses in maximizing the likelihood, specified as the comma-separated pair consisting of 'Optimfun' and either 'fminsearch' or 'fmincon'. Default is 'fminsearch'. You can only specify 'fmincon' if Optimization Toolbox is available. 32-3428

mle

The 'Optimfun' name-value pair argument is valid only when you fit custom distributions, that is, when you use the pdf and cdf, logpdf and logsf, or nloglf input arguments. Example: 'Optimfun','fmincon'

Output Arguments phat — Parameter estimates scalar value | row vector Parameter estimates, returned as a scalar value or a row vector. pci — Confidence intervals for parameter estimates 2-by-k matrix Confidence intervals for parameter estimates, returned as a column vector or a matrix depending on the number of parameters, hence the size of phat. pci is a 2-by-k matrix, where k is the number of parameters mle estimates. The first and second rows of the pci show the lower and upper confidence limits, respectively.

More About Survival Function The survival function is the probability of survival as a function of time. It is also called the survivor function. It gives the probability that the survival time of an individual exceeds a certain value. Since the cumulative distribution function, F(t), is the probability that the survival time is less than or equal to a given point in time, the survival function for a continuous distribution, S(t), is the complement of the cumulative distribution function: S(t) = 1 – F(t).

Tips When you supply distribution functions, mle computes the parameter estimates using an iterative maximization algorithm. With some models and data, a poor choice of starting point can cause mle to converge to a local optimum that is not the global maximizer, or to fail to converge entirely. Even in cases for which the log-likelihood is well-behaved near the global maximum, the choice of starting point is often crucial to convergence of the algorithm. In particular, if the initial parameter values are far from the MLEs, underflow in the distribution functions can lead to infinite log-likelihoods.

See Also Distribution Fitter | fitdist | mlecov | statset Topics “Maximum Likelihood Estimation” on page 5-21 “What Is Survival Analysis?” on page 14-2 “Estimate Parameters of Three-Parameter Weibull Distribution” on page B-175 Introduced before R2006a 32-3429

32

Functions

mlecov Asymptotic covariance of maximum likelihood estimators

Syntax acov = mlecov(params,data,'pdf',pdf) acov = mlecov(params,data,'pdf',pdf,'cdf',cdf) acov = mlecov(params,data,'logpdf',logpdf) acov = mlecov(params,data,'logpdf',logpdf,'logsf',logsf) acov = mlecov(params,data,'nloglf',nloglf) acov = mlecov( ___ ,Name,Value)

Description acov = mlecov(params,data,'pdf',pdf) returns an approximation to the asymptotic covariance matrix of the maximum likelihood estimators of the parameters for a distribution specified by the custom probability density function pdf. mlecov computes a finite difference approximation to the Hessian of the log-likelihood at the maximum likelihood estimates params, given the observed data, and returns the negative inverse of that Hessian. acov = mlecov(params,data,'pdf',pdf,'cdf',cdf) returns acov for a distribution specified by the custom probability density function pdf and cumulative distribution function cdf. acov = mlecov(params,data,'logpdf',logpdf) returns acov for a distribution specified by the custom log probability density function logpdf. acov = mlecov(params,data,'logpdf',logpdf,'logsf',logsf) returns acov for a distribution specified by the custom log probability density function logpdf and custom log survival function on page 32-3436 logsf. acov = mlecov(params,data,'nloglf',nloglf) returns acov for a distribution specified by the custom negative loglikelihood function nloglf. acov = mlecov( ___ ,Name,Value) specifies options using name-value pair arguments in addition to any of the input arguments in previous syntaxes. For example, you can specify the censored data and frequency of observations.

Examples Custom Probability Density Function Load the sample data. load carbig

32-3430

mlecov

The vector Weight shows the weights of 406 cars. In the MATLAB Editor, create a function that returns the probability density function (pdf) of a lognormal distribution. Save the file in your current working folder as lognormpdf.m. function newpdf = lognormpdf(data,mu,sigma) newpdf = exp((-(log(data)-mu).^2)/(2*sigma^2))./(data*sigma*sqrt(2*pi));

Estimate the parameters, mu and sigma, of the custom-defined distribution. phat = mle(Weight,'pdf',@lognormpdf,'start',[4.5 0.3]) phat = 7.9600

0.2804

Compute the approximate covariance matrix of the parameter estimates. acov = mlecov(phat,Weight,'pdf',@lognormpdf) acov = 1.0e-03 * 0.1937 -0.0000

-0.0000 0.0968

Estimate the standard errors of estimates. se = sqrt(diag(acov)) se = 0.0139 0.0098

The standard error of the estimates of mu and sigma are 0.0139 and 0.0098, respectively.

Custom Log Probability Density Function In the MATLAB Editor, create a function that returns the log probability density function of a beta distribution. Save the file in your current working folder as betalogpdf.m. function logpdf = betalogpdf(x,a,b) logpdf = (a-1)*log(x)+(b-1)*log(1-x)-betaln(a,b);

Generate sample data from a beta distribution with parameters 1.23 and 3.45 and estimate the parameters using the simulated data. rng('default') x = betarnd(1.23,3.45,25,1); phat = mle(x,'dist','beta') phat = 1.1213

2.7182

32-3431

32

Functions

Compute the approximate covariance matrix of the parameter estimates. acov = mlecov(phat,x,'logpdf',@betalogpdf) acov = 0.0810 0.1646

0.1646 0.6074

Custom Log pdf and Survival Function Load the sample data. load('readmissiontimes.mat');

The sample data includes ReadmissionTime, which has readmission times for 100 patients. The column vector Censored has the censorship information for each patient, where 1 indicates a censored observation, and 0 indicates the exact readmission time is observed. This is simulated data. Define a custom log probability density and survival function. custlogpdf = @(data,lambda,k) log(k)-k*log(lambda)... +(k-1)*log(data)-(data/lambda).^k; custlogsf = @(data,lambda,k) -(data/lambda).^k;

Estimate the parameters, lambda and k, of the custom distribution for the censored sample data. phat = mle(ReadmissionTime,'logpdf',custlogpdf,... 'logsf',custlogsf,'start',[1,0.75],'Censoring',Censored) phat = 1×2 9.2090

1.4223

The scale and shape parameters of the custom-defined distribution are 9.2090 and 1.4223, respectively. Compute the approximate covariance matrix of the parameter estimates. acov = mlecov(phat,ReadmissionTime,... 'logpdf',custlogpdf,'logsf',custlogsf,'Censoring',Censored) acov = 2×2 0.5653 0.0102

0.0102 0.0163

Custom Log Negative Likelihood Function Load the sample data. load('readmissiontimes.mat');

32-3432

mlecov

The sample data includes ReadmissionTime, which has readmission times for 100 patients. This is simulated data. Define a negative log likelihood function. custnloglf = @(lambda,data,cens,freq) -length(data)*log(lambda)... + nansum(lambda*data);

Estimate the parameters of the defined distribution. phat = mle(ReadmissionTime,'nloglf',custnloglf,'start',0.05) phat = 0.1462

Compute the variance of the parameter estimate. acov = mlecov(phat,ReadmissionTime,'nloglf',custnloglf) acov = 2.1374e-04

Compute the standard error. sqrt(acov) ans = 0.0146

Input Arguments params — Parameter estimates scalar value | vector Parameter estimates, specified as a scalar value or vector of scalar values. These parameter estimates must be maximum likelihood estimates. For example, you can specify parameter estimates returned by mle. Data Types: single | double data — Sample data vector Sample data mle uses to estimate the distribution parameters, specified as a vector. Data Types: single | double pdf — Custom probability density function function handle Custom probability distribution function, specified as a function handle created using @. This custom function accepts the vector data and one or more individual distribution parameters as input parameters, and returns a vector of probability density values. For example, if the name of the custom probability density function is newpdf, then you can specify the function handle in mlecov as follows. Example: @newpdf Data Types: function_handle 32-3433

32

Functions

cdf — Custom cumulative distribution function function handle Custom cumulative distribution function, specified as a function handle created using @. This custom function accepts the vector data and one or more individual distribution parameters as input parameters, and returns a vector of cumulative probability values. You must define cdf with pdf if data is censored and you use the 'Censoring' name-value pair argument. If 'Censoring' is not present, you do not have to specify cdf while using pdf. For example, if the name of the custom cumulative distribution function is newcdf, then you can specify the function handle in mlecov as follows. Example: @newcdf Data Types: function_handle logpdf — Custom log probability density function function handle Custom log probability density function, specified as a function handle created using @. This custom function accepts the vector data and one or more individual distribution parameters as input parameters, and returns a vector of log probability values. For example, if the name of the custom log probability density function is customlogpdf, then you can specify the function handle in mlecov as follows. Example: @customlogpdf Data Types: function_handle logsf — Custom log survival function function handle Custom log survival function on page 32-3436, specified as a function handle created using @. This custom function accepts the vector data and one or more individual distribution parameters as input parameters, and returns a vector of log survival probability values. You must define logsf with logpdf if data is censored and you use the 'Censoring' name-value pair argument. If 'Censoring' is not present, you do not have to specify logsf while using logpdf. For example, if the name of the custom log survival function is logsurvival, then you can specify the function handle in mlecov as follows. Example: @logsurvival Data Types: function_handle nloglf — Custom negative loglikelihood function function handle Custom negative loglikelihood function, specified as a function handle created using @. This custom function accepts the following input arguments. 32-3434

mlecov

params

Vector of distribution parameter values

data

Vector of data

cens

Boolean vector of censored values

freq

Vector of integer data frequencies

nloglf must accept all four arguments even if you do not use the 'Censoring' or 'Frequency' name-value pair arguments. You can write 'nloglf' to ignore cens and freq arguments in that case. nloglf returns a scalar negative loglikelihood value and optionally, a negative loglikelihood gradient vector (see the 'GradObj' field in 'Options'). If the name of the custom negative log likelihood function is negloglik, then you can specify the function handle in mlecov as follows. Example: @negloglik Data Types: function_handle Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Censoring',cens,'Options',opt specifies that mlecov reads the censored data information from the vector cens and performs according to the new options structure opt. Censoring — Indicator for censoring array of 0s (default) | array of 0s and 1s Indicator for censoring, specified as the comma-separated pair consisting of 'Censoring' and a Boolean array of the same size as data. Use 1 for observations that are right censored and 0 for observations that are fully observed. The default is all observations are fully observed. For censored data, you must use cdf with pdf, or logsf with logpdf, or nloglf must be defined to account for censoring. For example, if the censored data information is in the binary array called Censored, then you can specify the censored data as follows. Example: 'Censoring',Censored Data Types: logical Frequency — Frequency of observations array of 1s (default) | vector of nonnegative integer counts Frequency of observations, specified as the comma-separated pair consisting of 'Frequency' and an array containing nonnegative integer counts, which is the same size as data. The default is one observation per element of data. For example, if the observation frequencies are stored in an array named Freq, you can specify the frequencies as follows. Example: 'Frequency',Freq 32-3435

32

Functions

Data Types: single | double Options — Numerical options structure Numerical options for the finite difference Hessian calculation, specified as the comma-separated pair consisting of 'Options' and a structure returned by statset. You can set the options under a new name and use it in the name-value pair argument. The applicable statset parameters are as follows. Parameter

Value

'GradObj'

Default is 'off'. 'on' or 'off', indicating whether or not the function provided with the nloglf input argument can return the gradient vector of the negative loglikelihood as a second output.

'DerivStep'

Default is eps^(1/4). Relative step size used in finite difference for Hessian calculations. It can be a scalar, or the same size as params. A smaller value than the default might be appropriate if 'GradObj' is 'on'.

Example: 'Options',statset('mlecov') Data Types: struct

Output Arguments acov — Approximation to asymptotic covariance matrix p-by-p matrix Approximation to asymptotic covariance matrix, returned as a p-by-p matrix, where p is the number of parameters in params.

More About Survival Function The survival function is the probability of survival as a function of time. It is also called the survivor function. It gives the probability that the survival time of an individual exceeds a certain value. Since the cumulative distribution function, F(t), is the probability that the survival time is less than or equal to a given point in time, the survival function for a continuous distribution, S(t), is the complement of the cumulative distribution function: S(t) = 1 – F(t).

See Also mle Topics “What Is Survival Analysis?” on page 14-2 32-3436

mlecov

Introduced before R2006a

32-3437

32

Functions

mnpdf Multinomial probability density function

Syntax Y = mnpdf(X,PROB)

Description Y = mnpdf(X,PROB) returns the pdf for the multinomial distribution with probabilities PROB, evaluated at each row of X. X and PROB are m-by-k matrices or 1-by-k vectors, where k is the number of multinomial bins or categories. Each row of PROB must sum to one, and the sample sizes for each observation (rows of X) are given by the row sums sum(X,2). Y is an m-by-1 vector, and mnpdf computes each row of Y using the corresponding rows of the inputs, or replicates them if needed.

Examples Compute the Multinomial Distribution pdf Compute the pdf of a multinomial distribution with a sample size of n = 10. The probabilities are p = 1/2 for outcome 1, p = 1/3 for outcome 2, and p = 1/6 for outcome 3. p = [1/2 1/3 1/6]; n = 10; x1 = 0:n; x2 = 0:n; [X1,X2] = meshgrid(x1,x2); X3 = n-(X1+X2);

Compute the pdf of the distribution. Y = mnpdf([X1(:),X2(:),X3(:)],repmat(p,(n+1)^2,1));

Plot the pdf on a 3-dimensional figure. Y = reshape(Y,n+1,n+1); bar3(Y) h = gca; h.XTickLabel = [0:n]; h.YTickLabel = [0:n]; xlabel('x_1') ylabel('x_2') zlabel('Probability Mass') title('Trinomial Distribution')

32-3438

mnpdf

Note that the visualization does not show x3, which is determined by the constraint x1 + x2 + x3 = n.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also mnrnd Topics “Multinomial Distribution” on page B-96 Introduced in R2006b

32-3439

32

Functions

mnrfit Multinomial logistic regression

Syntax B = mnrfit(X,Y) B = mnrfit(X,Y,Name,Value) [B,dev,stats] = mnrfit( ___ )

Description B = mnrfit(X,Y) returns a matrix, B, of coefficient estimates for a multinomial logistic regression of the nominal responses in Y on the predictors in X. B = mnrfit(X,Y,Name,Value) returns a matrix, B, of coefficient estimates for a multinomial model fit with additional options specified by one or more Name,Value pair arguments. For example, you can fit a nominal, an ordinal, or a hierarchical model, or change the link function. [B,dev,stats] = mnrfit( ___ ) also returns the deviance of the fit, dev, and the structure stats for any of the previous input arguments. stats contains model statistics such as degrees of freedom, standard errors for coefficient estimates, and residuals.

Examples Multinomial Regression for Nominal Responses Fit a multinomial regression for nominal outcomes and interpret the results. Load the sample data. load fisheriris

The column vector, species, consists of iris flowers of three different species, setosa, versicolor, virginica. The double matrix meas consists of four types of measurements on the flowers, the length and width of sepals and petals in centimeters, respectively. Define the nominal response variable using a categorical array. sp = categorical(species);

Fit a multinomial regression model to predict the species using the measurements. [B,dev,stats] = mnrfit(meas,sp); B B = 5×2 103 × 1.9722

32-3440

0.0426

mnrfit

0.6584 -0.5553 -0.5045 -2.6981

0.0025 0.0067 -0.0094 -0.0183

This is a nominal model for the response category relative risks, with separate slopes on all four predictors, that is, each category of meas. The first row of B contains the intercept terms for the relative risk of the first two response categories, setosa and versicolor versus the reference category, virginica. The last four rows contain the slopes for the models for the first two categories. mnrfit accepts the third category as the reference category. The relative risk of an iris flower being species 2 (versicolor) versus species 3 (virginica) is the ratio of the two probabilities (the probability of being species 2 and the probability of being species 3). The model for the relative risk is ln

πversicolor = 42 . 6 + 2 . 5X1 + 6 . 7X2 − 9 . 4X3 − 18 . 3X4 . πvirginica

The coefficients express both the effects of the predictor variables on the relative risk and the log odds of being in one category versus the reference category. For example, the estimated coefficient 2.5 indicates that the relative risk of being species 2 (versicolor) versus species 3 (virginica) increases exp(2.5) times for each unit increase in X1, the first measurement, given all else is equal. The relative log odds of being versicolor versus virginica increases 2.5 times with a one-unit increase in X1, given all else is equal. If the coefficients are converging toward infinity or negative infinity, the estimated coefficients can vary slightly depending on your operating system. Check the statistical significance of the model coefficients. stats.p ans = 5×2 0 0 0 0 0

0.0000 0.0281 0.0000 0.0000 0.0000

The small p-values indicate that all measures are significant on the relative risk of being a setosa versus a virginica (species 1 compared to species 3) and being a versicolor versus a virginica (species 2 compared to species 3). Request the standard errors of coefficient estimates. stats.se ans = 5×2 12.4038 3.5783 3.1760 3.5403

5.2719 1.1228 1.4789 1.2934

32-3441

32

Functions

7.1203

2.0967

Calculate the 95% confidence limits for the coefficients. LL = stats.beta - 1.96.*stats.se; UL = stats.beta + 1.96.*stats.se;

Display the confidence intervals for the coefficients of the model for the relative risk of being a setosa versus a virginica (the first column of coefficients in B). [LL(:,1) UL(:,1)] ans = 5×2 103 × 1.9478 0.6514 -0.5616 -0.5114 -2.7120

1.9965 0.6655 -0.5491 -0.4975 -2.6841

Find the confidence intervals for the coefficients of the model for the relative risk of being a versicolor versus a virginica (the second column of coefficients in B). [LL(:,2) UL(:,2)] ans = 5×2 32.3049 0.2645 3.7823 -11.9644 -22.3957

52.9707 4.6660 9.5795 -6.8944 -14.1766

Multinomial Regression for Ordinal Responses Fit a multinomial regression model for categorical responses with natural ordering among categories. Load the sample data and define the predictor variables. load carbig X = [Acceleration Displacement Horsepower Weight];

The predictor variables are the acceleration, engine displacement, horsepower, and weight of the cars. The response variable is miles per gallon (mpg). Create an ordinal response variable categorizing MPG into four levels from 9 to 48 mpg by labeling the response values in the range 9-19 as 1, 20-29 as 2, 30-39 as 3, and 40-48 as 4. miles = ordinal(MPG,{'1','2','3','4'},[],[9,19,29,39,48]);

Fit an ordinal response model for the response variable miles. 32-3442

mnrfit

[B,dev,stats] = mnrfit(X,miles,'model','ordinal'); B B = 7×1 -16.6895 -11.7208 -8.0606 0.1048 0.0103 0.0645 0.0017

The first three elements of B are the intercept terms for the models, and the last four elements of B are the coefficients of the covariates, assumed common across all categories. This model corresponds to parallel regression, which is also called the proportional odds model, where there is a different intercept but common slopes among categories. You can specify this using the 'interactions','off' name-value pair argument, which is the default for ordinal models. [B(1:3)'; repmat(B(4:end),1,3)] ans = 5×3 -16.6895 0.1048 0.0103 0.0645 0.0017

-11.7208 0.1048 0.0103 0.0645 0.0017

-8.0606 0.1048 0.0103 0.0645 0.0017

The link function in the model is logit ('link','logit'), which is the default for an ordinal model. The coefficients express the relative risk or log odds of the mpg of a car being less than or equal to one value versus greater than that value. The proportional odds model in this example is ln

P mpg ≤ 19 P mpg > 19

= − 16 . 6895 + 0 . 1048X A + 0 . 0103XD + 0 . 0645XH + 0 . 0017XW

ln

P mpg ≤ 29 P mpg > 29

= − 11 . 7208 + 0 . 1048X A + 0 . 0103XD + 0 . 0645XH + 0 . 0017XW

ln

P mpg ≤ 39 P mpg > 39

= − 8 . 0606 + 0 . 1048X A + 0 . 0103XD + 0 . 0645XH + 0 . 0017XW

For example, the coefficient estimate of 0.1048 indicates that a unit change in acceleration would impact the odds of the mpg of a car being less than or equal to 19 versus more than 19, or being less than or equal to 29 versus greater than 29, or being less than or equal to 39 versus greater than 39, by a factor of exp(0.01048) given all else is equal. Assess the significance of the coefficients. stats.p ans = 7×1 0.0000

32-3443

32

Functions

0.0000 0.0000 0.1899 0.0350 0.0000 0.0118

The p-values of 0.035, 0.0000, and 0.0118 for engine displacement, horsepower, and weight of a car, respectively, indicate that these factors are significant on the odds of mpg of a car being less than or equal to a certain value versus being greater than that value.

Hierarchical Multinomial Regression Model Fit a hierarchical multinomial regression model. Load the sample data. load('smoking.mat');

The data set smoking contains five variables: sex, age, weight, and systolic and diastolic blood pressure. Sex is a binary variable where 1 indicates female patients, and 0 indicates male patients. Define the response variable. Y = categorical(smoking.Smoker);

The data in Smoker has four categories: • 0: Nonsmoker, 0 cigarettes a day • 1: Smoker, 1–5 cigarettes a day • 2: Smoker, 6–10 cigarettes a day • 3: Smoker, 11 or more cigarettes a day Define the predictor variables. X = [smoking.Sex smoking.Age smoking.Weight... smoking.SystolicBP smoking.DiastolicBP];

Fit a hierarchical multinomial model. [B,dev,stats] = mnrfit(X,Y,'model','hierarchical'); B B = 6×3 43.8148 1.8709 0.0188 0.0046 -0.2170 -0.2273

32-3444

5.9571 -0.0230 0.0625 -0.0072 0.0416 -0.1449

44.0712 0.0662 0.1335 -0.0130 -0.0324 -0.4824

mnrfit

The first column of B includes the intercept and the coefficient estimates for the model of the relative risk of being a nonsmoker versus a smoker. The second column includes the parameter estimates for modeling the log odds of smoking 1–5 cigarettes a day versus more than five cigarettes a day given that a person is a smoker. Finally, the third column includes the parameter estimates for modeling the log odds of a person smoking 6–10 cigarettes a day versus more than 10 cigarettes a day given he/she smokes more than 5 cigarettes a day. The coefficients differ across categories. You can specify this using the 'interactions','on' name-value pair argument, which is the default for hierarchical models. So, the model in this example is ln

P y=0 P y>0

ln

P 1≤y≤5 P y>5

ln

P 6 ≤ y ≤ 10 P y > 10

= 43 . 8148 + 1 . 8709XS + 0 . 0188X A + 0 . 0046XW − 0 . 2170XSBP − 0 . 2273XDBP = 5 . 9571 − 0 . 0230XS + 0 . 0625X A − 0 . 0072XW + 0 . 0416XSBP − 0 . 1449XDBP = 44 . 0712 + 0 . 0662XS + 0 . 1335X A − 0 . 0130XW − 0 . 0324XSBP − 0

. 4824XDBP For example, the coefficient estimate of 1.8709 indicates that the likelihood of being a smoker versus a nonsmoker increases by exp(1.8709) = 6.49 times as the gender changes from female to male given everything else held constant. Assess the statistical significance of the terms. stats.p ans = 6×3 0.0000 0.3549 0.6850 0.9032 0.0009 0.0004

0.5363 0.9912 0.2676 0.8523 0.5187 0.0483

0.2149 0.9835 0.2313 0.8514 0.8165 0.0545

Sex, age, or weight don’t appear significant on any level. The p-values of 0.0009 and 0.0004 indicate that both types of blood pressure are significant on the relative risk of a person being a smoker versus a nonsmoker. The p-value of 0.0483 shows that only diastolic blood pressure is significant on the odds of a person smoking 0–5 cigarettes a day versus more than 5 cigarettes a day. Similarly, the p-value of 0.0545 indicates that diastolic blood pressure is significant on the odds of a person smoking 6–10 cigarettes a day versus more than 10 cigarettes a day. Check if any nonsignificant factors are correlated to each other. Draw a scatterplot of age versus weight grouped by sex. figure() gscatter(smoking.Age,smoking.Weight,smoking.Sex) legend('Male','Female') xlabel('Age') ylabel('Weight')

32-3445

32

Functions

The range of weight of an individual seems to differ according to gender. Age does not seem to have any obvious correlation with sex or weight. Age is insignificant and weight seems to be correlated with sex, so you can eliminate both and reconstruct the model. Eliminate age and weight from the model and fit a hierarchical model with sex, systolic blood pressure, and diastolic blood pressure as the predictor variables. X = double([smoking.Sex smoking.SystolicBP... smoking.DiastolicBP]); [B,dev,stats] = mnrfit(X,Y,'model','hierarchical'); B B = 4×3 44.8456 1.6045 -0.2161 -0.2222

5.3230 0.2330 0.0497 -0.1358

25.0248 0.4982 0.0179 -0.3092

Here, a coefficient estimate of 1.6045 indicates that the likelihood of being a nonsmoker versus a smoker increases by exp(1.6045) = 4.97 times as sex changes from male to female. A unit increase in the systolic blood pressure indicates an exp(–.2161) = 0.8056 decrease in the likelihood of being a nonsmoker versus a smoker. Similarly, a unit increase in the diastolic blood pressure indicates an exp(–.2222) = 0.8007 decrease in the relative rate of being a nonsmoker versus being a smoker. Assess the statistical significance of the terms. 32-3446

mnrfit

stats.p ans = 4×3 0.0000 0.0210 0.0010 0.0003

0.4715 0.7488 0.4107 0.0483

0.2325 0.6362 0.8899 0.0718

The p-values of 0.0210, 0.0010, and 0.0003 indicate that the terms sex and both types of blood pressure are significant on the relative risk of a person being a nonsmoker versus a smoker, given the other terms in the model. Based on the p-value of 0.0483, diastolic blood pressure appears significant on the relative risk of a person smoking 1–5 cigarettes versus more than 5 cigarettes a day, given that this person is a smoker. Because none of the p-values on the third column are less than 0.05, you can say that none of the variables are statistically significant on the relative risk of a person smoking from 6–10 cigarettes versus more than 10 cigarettes, given that this person smokes more than 5 cigarettes a day.

Input Arguments X — Observations on predictor variables n-by-p matrix Observations on predictor variables, specified as an n-by-p matrix. X contains n observations for p predictors. Note mnrfit automatically includes a constant term (intercept) in all models. Do not include a column of 1s in X. Data Types: single | double Y — Response values n-by-k matrix | n-by-1 column vector Response values, specified as a column vector or a matrix. Y can be one of the following: • An n-by-k matrix, where Y(i,j) is the number of outcomes of the multinomial category j for the predictor combinations given by X(i,:). In this case, the number of observations are made at each predictor combination. • An n-by-1 column vector of scalar integers from 1 to k indicating the value of the response for each observation. In this case, all sample sizes are 1. • An n-by-1 categorical array indicating the nominal or ordinal value of the response for each observation. In this case, all sample sizes are 1. Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. 32-3447

32

Functions

Example: 'Model','ordinal','Link','probit' specifies an ordinal model with a probit link function. Model — Type of model to fit 'nominal' (default) | 'ordinal' | 'hierarchical' Type of model to fit, specified as the comma-separated pair consisting of 'Model' and one of the following. 'nominal'

Default. There is no ordering among the response categories.

'ordinal'

There is a natural ordering among the response categories.

'hierarchical'

The choice of response category is sequential/nested.

Example: 'Model','ordinal' Interactions — Indicator for interaction between multinomial categories and coefficients 'on' | 'off' Indicator for an interaction between the multinomial categories and coefficients, specified as the comma-separated pair consisting of 'Interactions' and one of the following. 'on'

Default for nominal and hierarchical models. Fit a model with different coefficients across categories.

'off'

Default for ordinal models. Fit a model with a common set of coefficients for the predictor variables, across all multinomial categories. This is often described as parallel regression or the proportional odds model.

In all cases, the model has different intercepts across categories. The choice of 'Interactions' determines the dimensions of the output array B. Example: 'Interactions','off' Link — Link function 'logit' (default) | 'probit' | 'comploglog' | 'loglog' Link function to use for ordinal and hierarchical models, specified as the comma-separated pair consisting of 'Link' and one of the following. 'logit'

Default. f(γ) = ln(γ/(1 –γ))

'probit'

f(γ) = Φ-1(γ) — error term is normally distributed with variance 1

'comploglog'

Complementary log-log f(γ) = ln(–ln(1 – γ))

'loglog'

f(γ) = ln(–ln(γ))

The link function defines the relationship between response probabilities and the linear combination of predictors, Xβ. The link functions might be functions of cumulative or conditional probabilities based on whether the model is for an ordinal or a sequential/nested response. For example, for an ordinal model, γ represents the cumulative probability of being in categories 1 to j and the model with a logit link function as follows: ln

32-3448

π1 + π2 + ⋯ + π j γ = ln = β0 j + β1 X1 + β2 X2 + ⋯ + βp Xp, 1−γ π j + 1 + ⋯ + πk

mnrfit

where k represents the last category. You cannot specify the 'Link' parameter for nominal models; these always use a multinomial logit link, ln

πj = β j0 + β j1 X j1 + β j2 X j2 + ⋯ + β jp X jp, πr

j = 1, …, k − 1,

where π stands for a categorical probability, and r corresponds to the reference category. mnrfit uses the last category as the reference category for nominal models. Example: 'Link','loglog' EstDisp — Indicator for estimating dispersion parameter 'off' (default) | 'on' Indicator for estimating a dispersion parameter, specified as the comma-separated pair consisting of 'EstDisp' and one of the following. 'off'

Default. Use the theoretical dispersion value of 1.

'on'

Estimate a dispersion parameter for the multinomial distribution in computing standard errors.

Example: 'EstDisp','on'

Output Arguments B — Coefficient estimates vector | matrix Coefficient estimates for a multinomial logistic regression of the responses in Y, returned as a vector or a matrix. • If 'Interaction' is 'off', then B is a k – 1 + p vector. The first k – 1 rows of B correspond to the intercept terms, one for each k – 1 multinomial categories, and the remaining p rows correspond to the predictor coefficients, which are common for all of the first k – 1 categories. • If 'Interaction' is 'on', then B is a (p + 1)-by-(k – 1) matrix. Each column of B corresponds to the estimated intercept term and predictor coefficients, one for each of the first k – 1 multinomial categories. The estimates for the kth category are taken to be zero as mnrfit takes the last category as the reference category. dev — Deviance of the fit scalar value Deviance of the fit, returned as a scalar value. It is twice the difference between the maximum achievable log likelihood and that attained under the fitted model. This corresponds to the sum of deviance residuals, n k

dev = 2 * ∑ ∑ yi j * log i

j

yi j = π i j * mi

n

∑ rdi, i

where rdi are the deviance residuals. For deviance residuals see stats. 32-3449

32

Functions

stats — Model statistics structure Model statistics, returned as a structure that contains the following fields. beta

The coefficient estimates. These are the same as B.

dfe

Degrees of freedom for error • If 'Interactions' is 'off', then degrees of freedom is n*(k – 1) – (k – 1 + p). • If 'Interactions' is 'on', then degrees of freedom is (n – p + 1)*(k – 1).

sfit

Estimated dispersion parameter.

s

Theoretical or estimated dispersion parameter. • If 'Estdisp' is 'off', then s is the theoretical dispersion parameter, 1. • If 'Estdisp' is 'on', then s is equal to the estimated dispersion parameter, sfit.

estdisp

Indicator for a theoretical or estimated dispersion parameter.

se

Standard errors of coefficient estimates, B.

coeffcorr

Estimated correlation matrix for B.

covb

Estimated covariance matrix for B.

t

t statistics for B.

p

p-values for B.

resid

Raw residuals. Observed minus fitted values, ri j = yi j − π i j * mi,

i = 1, ⋯, n

, j = 1, ⋯, k

where πij is the categorical, cumulative or conditional probability, and mi is the corresponding sample size. residp

Pearson residuals, which are the raw residuals scaled by the estimated standard deviation: rpi j =

ri j = σij

yi j − π i j * mi π i j * 1 − π i j * mi

,

i = 1, ⋯, n

, j = 1, ⋯, k

where πij is the categorical, cumulative, or conditional probability, and mi is the corresponding sample size. residd

Deviance residuals: k

rdi = 2 * ∑ j yi j * log

yi j , π i j * mi

i = 1, ⋯, n .

where πij is the categorical, cumulative, or conditional probability, and mi is the corresponding sample size.

32-3450

mnrfit

Algorithms mnrfit treats NaNs in either X or Y as missing values, and ignores them.

References [1] McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990. [2] Long, J. S. Regression Models for Categorical and Limited Dependent Variables. Sage Publications, 1997. [3] Dobson, A. J., and A. G. Barnett. An Introduction to Generalized Linear Models. Chapman and Hall/CRC. Taylor & Francis Group, 2008.

See Also fitglm | glmfit | glmval | mnrval Topics “Multinomial Distribution” on page B-96 “Multinomial Models for Nominal Responses” on page 12-2 “Multinomial Models for Ordinal Responses” on page 12-4 “Hierarchical Multinomial Models” on page 12-7 Introduced in R2006b

32-3451

32

Functions

mnrnd Multinomial random numbers

Syntax r = mnrnd(n,p) R = mnrnd(n,p,m) R = mnrnd(N,P)

Description r = mnrnd(n,p) returns random values r from the multinomial distribution with parameters n and p. n is a positive integer specifying the number of trials (sample size) for each multinomial outcome. p is a 1-by-k vector of multinomial probabilities, where k is the number of multinomial bins or categories. p must sum to one. (If p does not sum to one, r consists entirely of NaN values.) r is a 1by-k vector, containing counts for each of the k multinomial bins. R = mnrnd(n,p,m) returns m random vectors from the multinomial distribution with parameters n and p. R is a m-by-k matrix, where k is the number of multinomial bins or categories. Each row of R corresponds to one multinomial outcome. R = mnrnd(N,P) generates outcomes from different multinomial distributions. P is a m-by-k matrix, where k is the number of multinomial bins or categories and each of the m rows contains a different set of multinomial probabilities. Each row of P must sum to one. (If any row of P does not sum to one, the corresponding row of R consists entirely of NaN values.) N is a m-by-1 vector of positive integers or a single positive integer (replicated by mnrnd to a m-by-1 vector). R is a m-by-k matrix. Each row of R is generated using the corresponding rows of N and P.

Examples Generate 2 random vectors with the same probabilities: n p R R

= 1e3; = [0.2,0.3,0.5]; = mnrnd(n,p,2) = 215 282 503 194 303 503

Generate 2 random vectors with different probabilities: n = 1e3; P = [0.2, 0.3, 0.3, 0.4, R = mnrnd(n,P) R = 186 290 290 389

32-3452

0.5; ... 0.3;]; 524 321

mnrnd

See Also mnpdf Topics “Multinomial Distribution” on page B-96 Introduced in R2006b

32-3453

32

Functions

mnrval Multinomial logistic regression values

Syntax pihat = mnrval(B,X) [pihat,dlow,dhi] = mnrval(B,X,stats) [pihat,dlow,dhi] = mnrval(B,X,stats,Name,Value) yhat = mnrval(B,X,ssize) [yhat,dlow,dhi] = mnrval(B,X,ssize,stats) [yhat,dlow,dhi] = mnrval(B,X,ssize,stats,Name,Value)

Description pihat = mnrval(B,X) returns the predicted probabilities for the multinomial logistic regression model with predictors, X, and the coefficient estimates, B. pihat is an n-by-k matrix of predicted probabilities for each multinomial category. B is the vector or matrix that contains the coefficient estimates returned by mnrfit. And X is an n-by-p matrix which contains n observations for p predictors. Note mnrval automatically includes a constant term in all models. Do not enter a column of 1s in X. [pihat,dlow,dhi] = mnrval(B,X,stats) also returns 95% error bounds on the predicted probabilities, pihat, using the statistics in the structure, stats, returned by mnrfit. The lower and upper confidence bounds for pihat are pihat minus dlow and pihat plus dhi, respectively. Confidence bounds are nonsimultaneous and only apply to the fitted curve, not to new observations. [pihat,dlow,dhi] = mnrval(B,X,stats,Name,Value) returns the predicted probabilities and 95% error bounds on the predicted probabilities pihat, with additional options specified by one or more Name,Value pair arguments. For example, you can specify the model type, link function, and the type of probabilities to return. yhat = mnrval(B,X,ssize) returns the predicted category counts for sample sizes, ssize. [yhat,dlow,dhi] = mnrval(B,X,ssize,stats) also computes 95% error bounds on the predicted counts yhat, using the statistics in the structure, stats, returned by mnrfit. The lower and upper confidence bounds for yhat are yhat minus dlo and yhat plus dhi, respectively. Confidence bounds are nonsimultaneous and they apply to the fitted curve, not to new observations. [yhat,dlow,dhi] = mnrval(B,X,ssize,stats,Name,Value) returns the predicted category counts and 95% error bounds on the predicted counts yhat, with additional options specified by one or more Name,Value pair arguments. 32-3454

mnrval

For example, you can specify the model type, link function, and the type of predicted counts to return.

Examples Estimate Category Probabilities for Nominal Responses Fit a multinomial regression for nominal outcomes and estimate the category probabilities. Load the sample data. load fisheriris

The column vector, species, consists of iris flowers of three different species, setosa, versicolor, virginica. The double matrix meas consists of four types of measurements on the flowers, the length and width of sepals and petals in centimeters, respectively. Define the nominal response variable. sp = nominal(species); sp = double(sp);

Now in sp, 1, 2, and 3 indicate the species setosa, versicolor, and virginica, respectively. Fit a nominal model to estimate the species using the flower measurements as the predictor variables. [B,dev,stats] = mnrfit(meas,sp);

Estimate the probability of being a certain kind of species for an iris flower having the measurements (6.3, 2.8, 4.9, 1.7). x = [6.3, 2.8, 4.9, 1.7]; pihat = mnrval(B,x); pihat pihat = 1×3 0

0.3977

0.6023

The probability of an iris flower having the measurements (6.3, 2.8, 4.9, 1.7) being a setosa is 0, a versicolor is 0.3977, and a virginica is 0.6023.

Estimate Upper and Lower Error Bounds for Probability Estimates of Ordinal Responses Fit a multinomial regression model for categorical responses with natural ordering among categories. Then estimate the upper and lower confidence bounds for the category probability estimates. Load the sample data and define the predictor variables. load('carbig.mat') X = [Acceleration Displacement Horsepower Weight];

32-3455

32

Functions

The predictor variables are the acceleration, engine displacement, horsepower, and the weight of the cars. The response variable is miles per gallon (MPG). Create an ordinal response variable categorizing MPG into four levels from 9 to 48 mpg. miles = ordinal(MPG,{'1','2','3','4'},[],[9,19,29,39,48]); miles = double(miles);

Now in miles, 1 indicates the cars with miles per gallon from 9 to 19, and 2 indicates the cars with miles per gallon from 20 to 29. Similarly, 3 and 4 indicate the cars with miles per gallon from 30 to 39 and 40 to 48, respectively. Fit a multinomial regression model for the response variable miles. For an ordinal model, the default 'link' is logit and the default 'interactions' is 'off'. [B,dev,stats] = mnrfit(X,miles,'model','ordinal');

Compute the probability estimates and 95% error bounds for probability confidence intervals for miles per gallon of a car with x = (12, 113, 110, 2670). x = [12,113,110,2670]; [pihat,dlow,hi] = mnrval(B,x,stats,'model','ordinal'); pihat pihat = 1×4 0.0615

0.8426

0.0932

0.0027

Calculate the confidence bounds for the category probability estimates. LL = pihat - dlow; UL = pihat + hi; [LL;UL] ans = 2×4 0.0073 0.1157

0.7829 0.9022

0.0283 0.1580

-0.0003 0.0057

Estimate Category Counts and Error Bounds for Nominal Responses Fit a multinomial regression for nominal outcomes and estimate the category counts. Load the sample data. load fisheriris

The column vector, species, consists of iris flowers of three different species, setosa, versicolor, and virginica. The double matrix meas consists of four types of measurements on the flowers, the length and width of sepals and petals in centimeters, respectively. Define the nominal response variable. 32-3456

mnrval

sp = nominal(species); sp = double(sp);

Now in sp, 1, 2, and 3 indicate the species setosa, versicolor, and virginica, respectively. Fit a nominal model to estimate the species based on the flower measurements. [B,dev,stats] = mnrfit(meas,sp);

Estimate the number in each species category for a sample of 100 iris flowers all with the measurements (6.3, 2.8, 4.9, 1.7). x = [6.3, 2.8, 4.9, 1.7]; yhat = mnrval(B,x,18) yhat = 1×3 0

7.1578

10.8422

Estimate the error bounds for the counts. [yhat,dlow,hi] = mnrval(B,x,18,stats,'model','nominal');

Calculate the confidence bounds for the category probability estimates. LL = yhat - dlow; UL = yhat + hi; [LL;UL] ans = 2×3 0 0

3.3019 11.0137

6.9863 14.6981

Plot the Count Estimates Create sample data with one predictor variable and a categorical response variable with three categories. x = [-3 -2 -1 0 1 2 3]'; Y = [1 11 13; 2 9 14; 6 14 5; 5 10 10;... 5 14 6; 7 13 5; 8 11 6]; [Y x] ans = 7×4 1 2 6 5 5 7 8

11 9 14 10 14 13 11

13 14 5 10 6 5 6

-3 -2 -1 0 1 2 3

32-3457

32

Functions

There are observations on seven different values of the predictor variable x . The response variable Y has three categories and the data shows how many of the 25 individuals are in each category of Y for each observation of x. For example, when x is -3, 1 of 25 individuals is observed in category 1, 11 observed in category 2, and 13 observed in category 3. Similarly, when x is 1, 5 of the individuals are observed in category 1, 14 are observed in category 2, and 6 are observed in category 3. Plot the number in each category versus the x values, on a stacked bar graph. bar(x,Y,'stacked'); ylim([0 25]);

Fit a nominal model for the individual response category probabilities, with separate slopes on the single predictor variable, x, for each category. betaHatNom = mnrfit(x,Y,'model','nominal',... 'interactions','on') betaHatNom = 2×2 -0.6028 0.4068

0.3832 0.1948

The first row of betaHatOrd contains the intercept terms for the first two response categories. The second row contains the slopes. mnrfit accepts the third category as the reference category and hence assumes the coefficients for the third category are zero. Compute the predicted probabilities for the three response categories. 32-3458

mnrval

xx = linspace(-4,4)'; piHatNom = mnrval(betaHatNom,xx,'model','nominal',... 'interactions','on');

The probability of being in the third category is simply 1 - P(y = 1) - P( y = 2). Plot the estimated cumulative number in each category on the bar graph. line(xx,cumsum(25*piHatNom,2),'LineWidth',2);

The cumulative probability for the third category is always 1. Now, fit a "parallel" ordinal model for the cumulative response category probabilities, with a common slope on the single predictor variable, x, across all categories: betaHatOrd = mnrfit(x,Y,'model','ordinal',... 'interactions','off') betaHatOrd = 3×1 -1.5001 0.7266 0.2642

The first two elements of betaHatOrd are the intercept terms for the first two response categories. The last element of betaHatOrd is the common slope. 32-3459

32

Functions

Compute the predicted cumulative probabilities for the first two response categories. The cumulative probability for the third category is always 1. piHatOrd = mnrval(betaHatOrd,xx,'type','cumulative',... 'model','ordinal','interactions','off');

Plot the estimated cumulative number on the bar graph of the observed cumulative number. figure() bar(x,cumsum(Y,2),'grouped'); ylim([0 25]); line(xx,25*piHatOrd,'LineWidth',2);

Input Arguments B — Coefficient estimates vector or matrix returned by mnrfit Coefficient estimates for the multinomial logistic regression model, specified as a vector or matrix returned by mnrfit. It is a vector or matrix depending on the model and interactions. Example: B = mnrfit(X,y); pihat = mnrval(B,X) Data Types: single | double X — Sample data matrix 32-3460

mnrval

Sample data on predictors, specified as an n-by-p. X contains n observations for p predictors. Note mnrval automatically includes a constant term in all models. Do not enter a column of 1s in X. Example: pihat = mnrval(B,X) Data Types: single | double stats — Model statistics structure returned by mnrfit Model statistics, specified as a structure returned by mnrfit. You must use the stats input argument in mnrval to compute the lower and upper error bounds on the category probabilities and counts. Example: [B,dev,stats] = mnrfit(X,y);[pihat,dlo,dhi] = mnrval(B,X,stats) ssize — Sample sizes column vector of positive integers Sample sizes to return the number of items in response categories for each combination of the predictor variables, specified as an n-by-1 column vector of positive integers. For example, for a response variable having three categories, if an observation of the number of individuals in each category is y1, y2, and y3, respectively, then the sample size, m, for that observation is m = y1 + y2 + y3. If the sample sizes for n observations are in vector sample, then you can enter the sample sizes as follows. Example: yhat = mnrval(B,X,sample) Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'model','ordinal','link','probit','type','cumulative' specifies that mnrval returns the estimates for cumulative probabilities for an ordinal model with a probit link function. model — Type of multinomial model 'nominal' (default) | 'ordinal' | 'hierarchical' Type of multinomial model fit by mnrfit, specified as the comma-separated pair consisting of 'model' and one of the following. 'nominal'

Default. Specify when there is no ordering among the response categories.

'ordinal'

Specify when there is a natural ordering among the response categories.

'hierarchical Specify when the choice of response category is sequential. ' Example: 'model','ordinal' 32-3461

32

Functions

interactions — Indicator for interaction between multinomial categories and coefficients 'on' | 'off' Indicator for an interaction between the multinomial categories and coefficients in the model fit by mnrfit, specified as the comma-separated pair consisting of 'interactions' and one of the following. 'on'

Default for nominal and hierarchical models. Specify to fit a model with different intercepts and coefficients across categories.

'off'

Default for ordinal models. Specify to fit a model with different intercepts, but a common set of coefficients for the predictor variables, across all multinomial categories. This is often described as parallel regression or proportional odds model.

Example: 'interactions','off' Data Types: logical link — Link function 'logit' (default) | 'probit' | 'comploglog' | 'loglog' Link function mnrfit uses for ordinal and hierarchical models, specified as the comma-separated pair consisting of 'link' and one of the following. 'logit'

Default. f(γ) = ln(γ/(1 – γ))

'probit'

f(γ) = Φ-1(γ) — error term is normally distributed with variance 1

'comploglog'

Complementary log-log f(γ) = ln(–ln(1 –γ))

'loglog'

f(γ) = ln(–ln(γ))

The link function defines the relationship between response probabilities and the linear combination of predictors, Xβ. γ might be cumulative or conditional probabilities based on whether the model is for an ordinal or a sequential/nested response. You cannot specify the 'link' parameter for nominal models; these always use a multinomial logit link, ln

πj = β j0 + β j1 X j1 + β j2 X j2 + ⋯ + β jp X jp, πr

j = 1, …, k − 1,

where π stands for a categorical probability, and r corresponds to the reference category, k is the total number of response categories, p is the number of predictor variables. mnrfit uses the last category as the reference category for nominal models. Example: 'link','loglog' type — Type of probabilities or counts to estimate 'category' (default) | 'cumulative' | 'conditional' Type of probabilities or counts to estimate, specified as the comma-separated pair including 'type' and one of the following. 32-3462

mnrval

'category'

Default. Specify to return predictions and error bounds for the probabilities (or counts) of the k multinomial categories.

'cumulative'

Specify to return predictions and confidence bounds for the cumulative probabilities (or counts) of the first k – 1 multinomial categories, as an n-by-(k – 1) matrix. The predicted cumulative probability for the kth category is always 1.

'conditional'

Specify to return predictions and error bounds in terms of the first k – 1 conditional category probabilities (counts), i.e., the probability (count) for category j, given an outcome in category j or higher. When 'type' is 'conditional', and you supply the sample size argument ssize, the predicted counts at each row of X are conditioned on the corresponding element of ssize, across all categories.

Example: 'type','cumulative' confidence — Confidence level 0.95 (default) | scalar value in the range (0,1) Confidence level for the error bounds, specified as the comma-separated pair consisting of 'confidence' and a scalar value in the range (0,1). For example, for 99% error bounds, you can specify the confidence as follows: Example: 'confidence',0.99 Data Types: single | double

Output Arguments pihat — Probability estimates n-by-(k – 1) matrix Probability estimates for each multinomial category, returned as an n-by-(k – 1) matrix, where n is the number of observations, and k is the number of response categories. yhat — Count estimates n-by-k– 1 matrix Count estimates for the number in each response category, returned as an n-by-k – 1 matrix, where n is the number of observations, and k is the number of response categories. dlow — Lower error bound column vector Lower error bound to compute the lower confidence bound for pihat or yhat, returned as a column vector. The lower confidence bound for pihat is pihat minus dlow. Similarly, the lower confidence bound for yhat is yhat minus dlow. Confidence bounds are nonsimultaneous and only apply to the fitted curve, not to new observations. dhi — Upper error bound column vector 32-3463

32

Functions

Upper error bound to compute the upper confidence bound for pihat or yhat, returned as a column vector. The upper confidence bound for pihat is pihat plus dhi. Similarly, the upper confidence bound for yhat is yhat plus dhi. Confidence bounds are nonsimultaneous and only apply to the fitted curve, not to new observations.

References [1] McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990.

See Also fitglm | glmfit | glmval | mnrfit Topics “Multinomial Models for Nominal Responses” on page 12-2 “Multinomial Models for Ordinal Responses” on page 12-4 “Hierarchical Multinomial Models” on page 12-7 Introduced in R2006b

32-3464

moment

moment Central moment

Syntax m m m m

= = = =

moment(X,order) moment(X,order,'all') moment(X,order,dim) moment(X,order,vecdim)

Description m = moment(X,order) returns the central moment of X for the order specified by order. • If X is a vector, then moment(X,order) returns a scalar value that is the k-order central moment of the elements in X. • If X is a matrix, then moment(X,order) returns a row vector containing the k-order central moment of each column in X. • If X is a multidimensional array, then moment(X,order) operates along the first nonsingleton dimension of X. m = moment(X,order,'all') returns the central moment of the specified order for all elements of X. m = moment(X,order,dim) takes the central moment along the operating dimension dim of X. m = moment(X,order,vecdim) returns the central moment over the dimensions specified in the vector vecdim. For example, if X is a 2-by-3-by-4 array, then moment(X,1,[1 2]) returns a 1-by-1by-4 array. Each element of the output array is the first-order central moment of the elements on the corresponding page of X.

Examples Find Central Moment for Specified Order Set the random seed for reproducibility of the results. rng('default')

Generate a matrix with 6 rows and 5 columns. X = randn(6,5) X = 6×5 0.5377 1.8339 -2.2588 0.8622

-0.4336 0.3426 3.5784 2.7694

0.7254 -0.0631 0.7147 -0.2050

1.4090 1.4172 0.6715 -1.2075

0.4889 1.0347 0.7269 -0.3034

32-3465

32

Functions

0.3188 -1.3077

-1.3499 3.0349

-0.1241 1.4897

0.7172 1.6302

0.2939 -0.7873

Find the third-order central moment of X. m = moment(X,3) m = 1×5 -1.1143

-0.9973

0.1234

-1.1023

-0.1045

m is a row vector containing the third-order central moment of each column in X.

Find Central Moment Along Given Dimension Find the central moment along different dimensions for a multidimensional array. Set the random seed for reproducibility of the results. rng('default')

Create a 4-by-3-by-2 array of random numbers. X = randn([4,3,2]) X = X(:,:,1) = 0.5377 1.8339 -2.2588 0.8622

0.3188 -1.3077 -0.4336 0.3426

3.5784 2.7694 -1.3499 3.0349

-0.1241 1.4897 1.4090 1.4172

0.6715 -1.2075 0.7172 1.6302

X(:,:,2) = 0.7254 -0.0631 0.7147 -0.2050

Find the fourth-order central moment of X along the default dimension. m1 = moment(X,4) m1 = m1(:,:,1) = 11.4427 m1(:,:,2) =

32-3466

0.3553

33.6733

moment

0.0360

0.4902

2.3821

By default, moment operates along the first dimension of X whose size does not equal 1. In this case, this dimension is the first dimension of X. Therefore, m1 is a 1-by-3-by-2 array. Find the fourth-order central moment of X along the second dimension. m2 = moment(X,4,2) m2 = m2(:,:,1) = 7.3476 13.8702 0.4625 2.7741 m2(:,:,2) = 0.0341 2.2389 0.0171 0.6766

m2 is a 4-by-1-by-2 array. Find the fourth-order central moment of X along the third dimension. m3 = moment(X,4,3) m3 = 4×3 0.0001 0.8093 4.8866 0.0811

0.0024 3.8273 0.7205 0.0833

4.4627 15.6340 1.1412 0.2433

m3 is a 4-by-3 matrix.

Find Central Moment Along Vector of Dimensions Find the central moment over multiple dimensions by using the 'all' and vecdim input arguments. Set the random seed for reproducibility of the results. rng('default')

Create a 4-by-3-by-2 array of random numbers. X = randn([4 3 2]) X = X(:,:,1) =

32-3467

32

Functions

0.5377 1.8339 -2.2588 0.8622

0.3188 -1.3077 -0.4336 0.3426

3.5784 2.7694 -1.3499 3.0349

-0.1241 1.4897 1.4090 1.4172

0.6715 -1.2075 0.7172 1.6302

X(:,:,2) = 0.7254 -0.0631 0.7147 -0.2050

Find the third-order central moment of X. mall = moment(X,3,'all') mall = 0.2431

mall is the third-order central moment of the entire input data set X. Find the third-order moment of each page of X by specifying the first and second dimensions. mpage = moment(X,3,[1 2]) mpage = mpage(:,:,1) = 0.6002 mpage(:,:,2) = -0.3475

For example, mpage(1,1,2) is the third-order central moment of the elements in X(:,:,2). Find the third-order moment of the elements in each X(i,:,:) slice by specifying the second and third dimensions. mrow = moment(X,3,[2 3]) mrow = 4×1 2.7552 0.0443 -0.7585 0.5340

For example, mrow(1) is the third-order central moment of the elements in X(1,:,:).

32-3468

moment

Input Arguments X — Input data vector | matrix | multidimensional array Input data that represents a sample from a population, specified as a vector, matrix, or multidimensional array. • If X is a vector, then moment(X,order) returns a scalar value that is the k-order central moment of the elements in X. • If X is a matrix, then moment(X,order) returns a row vector containing the k-order central moment of each column in X. • If X is a multidimensional array, then moment(X,order) operates along the first nonsingleton dimension of X. To specify the operating dimension when X is a matrix or an array, use the dim input argument. Data Types: single | double order — Order of central moment positive integer Order of the central moment, specified as a positive integer. Data Types: single | double dim — Dimension positive integer Dimension along which to operate, specified as a positive integer. If you do not specify a value for dim, then the default is the first nonsingleton dimension of X. Consider the third-order central moment of a matrix X: • If dim is equal to 1, then moment(X,3,1) returns a row vector that contains the third-order central moment of each column in X. • If dim is equal to 2, then moment(X,3,2) returns a column vector that contains the third-order central moment of each row in X. If dim is greater than ndims(X) or if size(X,dim) is 1, then moment returns an array of zeros the same size as X. Data Types: single | double vecdim — Vector of dimensions positive integer vector Vector of dimensions, specified as a positive integer vector. Each element of vecdim represents a dimension of the input array X. The output m has length 1 in the specified operating dimensions. The other dimension lengths are the same for X and m. For example, if X is a 2-by-3-by-3 array, then moment(X,1,[1 2]) returns a 1-by-1-by-3 array. Each element of the output array is the first-order central moment of the elements on the corresponding page of X. 32-3469

32

Functions

Data Types: single | double

Output Arguments m — Central moments scalar | vector | matrix | multidimensional array Central moments, returned as a scalar, vector, matrix, or multidimensional array.

Algorithms The central moment of order k for a distribution is defined as k

mk = E(x − μ) , where µ is the mean of x, and E(t) represents the expected value of the quantity t. The moment function computes a sample version of this population value. mk =

n

1 xi − x ni∑ =1

k

.

Note that the first-order central moment is zero, and the second-order central moment is the variance computed using a divisor of n rather than n – 1, where n is the length of the vector x or the number of rows in the matrix X.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • The 'all' and vecdim input arguments are not supported. • The dim input argument must be a compile-time constant. • If you do not specify the dim input argument, the working (or operating) dimension can be different in the generated code. As a result, run-time errors can occur. For more details, see “Automatic dimension restriction” (MATLAB Coder). • If order is nonintegral and X is real, use moment(complex(X),order). 32-3470

moment

For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Usage notes and limitations: • The 'all' and vecdim input arguments are not supported. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also kurtosis | mean | skewness | std | var Introduced before R2006a

32-3471

32

Functions

multcompare Multiple comparison test

Syntax c = multcompare(stats) c = multcompare(stats,Name,Value) [c,m] = multcompare( ___ ) [c,m,h] = multcompare( ___ ) [c,m,h,gnames] = multcompare( ___ )

Description c = multcompare(stats) returns a matrix c of the pairwise comparison results from a multiple comparison test using the information contained in the stats structure. multcompare also displays an interactive graph of the estimates and comparison intervals. Each group mean is represented by a symbol, and the interval is represented by a line extending out from the symbol. Two group means are significantly different if their intervals are disjoint; they are not significantly different if their intervals overlap. If you use your mouse to select any group, then the graph will highlight all other groups that are significantly different, if any. c = multcompare(stats,Name,Value) returns a matrix of pairwise comparison results, c, using additional options specified by one or more Name,Value pair arguments. For example, you can specify the confidence interval, or the type of critical value to use in the multiple comparison. [c,m] = multcompare( ___ ) also returns a matrix, m, which contains estimated values of the means (or whatever statistics are being compared) for each group and the corresponding standard errors. You can use any of the previous syntaxes. [c,m,h] = multcompare( ___ ) also returns a handle, h, to the comparison graph. [c,m,h,gnames] = multcompare( ___ ) also returns a cell array, gnames, which contains the names of the groups.

Examples Multiple Comparison of Group Means Load the sample data. load carsmall

Perform a one-way analysis of variance (ANOVA) to see if there is any difference between the mileage of the cars by origin. [p,t,stats] = anova1(MPG,Origin,'off');

Perform a multiple comparison of the group means. [c,m,h,nms] = multcompare(stats);

32-3472

multcompare

multcompare displays the estimates with comparison intervals around them. You can click the graphs of each country to compare its mean to those of other countries. Now display the mean estimates and the standard errors with the corresponding group names. [nms num2cell(m)] ans=6×3 cell array {'USA' } {'Japan' } {'Germany'} {'France' } {'Sweden' } {'Italy' }

{[21.1328]} {[31.8000]} {[28.4444]} {[23.6667]} {[22.5000]} {[ 28]}

{[0.8814]} {[1.8206]} {[2.3504]} {[4.0711]} {[4.9860]} {[7.0513]}

Multiple Comparisons for Two-Way ANOVA Load the sample data. load popcorn popcorn popcorn = 6×3

32-3473

32

Functions

5.5000 5.5000 6.0000 6.5000 7.0000 7.0000

4.5000 4.5000 4.0000 5.0000 5.5000 5.0000

3.5000 4.0000 3.0000 4.0000 5.0000 4.5000

The data is from a study of popcorn brands and popper types (Hogg 1987). The columns of the matrix popcorn are brands (Gourmet, National, and Generic). The rows are popper types oil and air. In the study, researchers popped a batch of each brand three times with each popper. The values are the yield in cups of popped popcorn. Perform a two-way ANOVA. Also compute the statistics that you need to perform a multiple comparison test on the main effects. [~,~,stats] = anova2(popcorn,3,'off') stats = struct with fields: source: 'anova2' sigmasq: 0.1389 colmeans: [6.2500 4.7500 4] coln: 6 rowmeans: [4.5000 5.5000] rown: 9 inter: 1 pval: 0.7462 df: 12

The stats structure includes • The mean squared error (sigmasq) • The estimates of the mean yield for each popcorn brand (colmeans) • The number of observations for each popcorn brand (coln) • The estimate of the mean yield for each popper type (rowmeans) • The number of observations for each popper type (rown) • The number of interactions (inter) • The p-value that shows the significance level of the interaction term (pval) • The error degrees of freedom (df). Perform a multiple comparison test to see if the popcorn yield differs between pairs of popcorn brands (columns). c = multcompare(stats) Note: Your model includes an interaction term. A test of main effects can be difficult to interpret when the model includes interactions.

32-3474

multcompare

c = 3×6 1.0000 1.0000 2.0000

2.0000 3.0000 3.0000

0.9260 1.6760 0.1760

1.5000 2.2500 0.7500

2.0740 2.8240 1.3240

0.0000 0.0000 0.0116

The first two columns of c show the groups that are compared. The fourth column shows the difference between the estimated group means. The third and fifth columns show the lower and upper limits for 95% confidence intervals for the true mean difference. The sixth column contains the pvalue for a hypothesis test that the corresponding mean difference is equal to zero. All p-values (0, 0, and 0.0116) are very small, which indicates that the popcorn yield differs across all three brands. The figure shows the multiple comparison of the means. By default, the group 1 mean is highlighted and the comparison interval is in blue. Because the comparison intervals for the other two groups do not intersect with the intervals for the group 1 mean, they are highlighted in red. This lack of intersection indicates that both means are different than group 1 mean. Select other group means to confirm that all group means are significantly different from each other. Perform a multiple comparison test to see the popcorn yield differs between the two popper types (rows). c = multcompare(stats,'Estimate','row') Note: Your model includes an interaction term. A test of main effects can be difficult to interpret when the model includes interactions.

32-3475

32

Functions

c = 1×6 1.0000

2.0000

-1.3828

-1.0000

-0.6172

0.0001

The small p-value of 0.0001 indicates that the popcorn yield differs between the two popper types (air and oil). The figure shows the same results. The disjoint comparison intervals indicate that the group means are significantly different from each other.

Multiple Comparisons for Three-Way ANOVA Load the sample data. y = [52.7 57.5 45.9 44.5 53.0 57.0 45.9 44.0]'; g1 = [1 2 1 2 1 2 1 2]; g2 = {'hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'}; g3 = {'may';'may';'may';'may';'june';'june';'june';'june'};

y is the response vector and g1, g2, and g3 are the grouping variables (factors). Each factor has two levels, and every observation in y is identified by a combination of factor levels. For example, observation y(1) is associated with level 1 of factor g1, level 'hi' of factor g2, and level 'may' of factor g3. Similarly, observation y(6) is associated with level 2 of factor g1, level 'hi' of factor g2, and level 'june' of factor g3. 32-3476

multcompare

Test if the response is the same for all factor levels. Also compute the statistics required for multiple comparison tests. [~,~,stats] = anovan(y,{g1 g2 g3},'model','interaction',... 'varnames',{'g1','g2','g3'});

The p-value of 0.2578 indicates that the mean responses for levels 'may' and 'june' of factor g3 are not significantly different. The p-value of 0.0347 indicates that the mean responses for levels 1 and 2 of factor g1 are significantly different. Similarly, the p-value of 0.0048 indicates that the mean responses for levels 'hi' and 'lo' of factor g2 are significantly different. Perform multiple comparison tests to find out which groups of the factors g1 and g2 are significantly different. results = multcompare(stats,'Dimension',[1 2])

32-3477

32

Functions

results = 6×6 1.0000 1.0000 1.0000 2.0000 2.0000 3.0000

2.0000 3.0000 4.0000 3.0000 4.0000 4.0000

-6.8604 4.4896 6.1396 8.8896 10.5396 -0.8104

-4.4000 6.9500 8.6000 11.3500 13.0000 1.6500

-1.9396 9.4104 11.0604 13.8104 15.4604 4.1104

0.0280 0.0177 0.0143 0.0108 0.0095 0.0745

multcompare compares the combinations of groups (levels) of the two grouping variables, g1 and g2. In the results matrix, the number 1 corresponds to the combination of level 1 of g1 and level hi of g2, the number 2 corresponds to the combination of level 2 of g1 and level hi of g2. Similarly, the number 3 corresponds to the combination of level 1 of g1 and level lo of g2, and the number 4 corresponds to the combination of level 2 of g1 and level lo of g2. The last column of the matrix contains the p-values. For example, the first row of the matrix shows that the combination of level 1 of g1 and level hi of g2 has the same mean response values as the combination of level 2 of g1 and level hi of g2. The pvalue corresponding to this test is 0.0280, which indicates that the mean responses are significantly different. You can also see this result in the figure. The blue bar shows the comparison interval for the mean response for the combination of level 1 of g1 and level hi of g2. The red bars are the comparison intervals for the mean response for other group combinations. None of the red bars overlap with the blue bar, which means the mean response for the combination of level 1 of g1 and level hi of g2 is significantly different from the mean response for other group combinations.

32-3478

multcompare

You can test the other groups by clicking on the corresponding comparison interval for the group. The bar you click on turns to blue. The bars for the groups that are significantly different are red. The bars for the groups that are not significantly different are gray. For example, if you click on the comparison interval for the combination of level 1 of g1 and level lo of g2, the comparison interval for the combination of level 2 of g1 and level lo of g2 overlaps, and is therefore gray. Conversely, the other comparison intervals are red, indicating significant difference.

Input Arguments stats — Test data structure Test data, specified as a structure. You can create a structure using one of the following functions: • anova1 — One-way analysis of variance. • anova2 — Two-way analysis of variance. • anovan — N-way analysis of variance. • aoctool — Interactive analysis of covariance tool. • friedman — Friedman’s test. • kruskalwallis — Kruskal-Wallis test. multcompare does not support multiple comparisons using anovan output for a model that includes random or nested effects. The calculations for a random effects model produce a warning that all effects are treated as fixed. Nested models are not accepted. Data Types: struct Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Alpha',0.01,'CType','bonferroni','Display','off' computes the Bonferroni critical values, conducts the hypothesis tests at the 1% significance level, and omits the interactive display. Alpha — Significance level 0.05 (default) | scalar value in the range (0,1) Significance level of the multiple comparison test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1). The value specified for 'Alpha' determines the 100 × (1 – α) confidence levels of the intervals returned in the matrix c and in the figure. Example: 'Alpha',0.01 Data Types: single | double CType — Type of critical value 'tukey-kramer' (default) | 'hsd' | 'lsd' | 'bonferroni' | 'dunn-sidak' | 'scheffe' Type of critical value to use for the multiple comparison, specified as the comma-separated pair consisting of 'CType' and one of the following. 32-3479

32

Functions

Value

Description

'tukey-kramer' or 'hsd'

Tukey's honest significant difference criterion on page 9-22

'bonferroni'

Bonferroni method on page 9-23

'dunn-sidak'

Dunn and Sidák’s approach on page 9-23

'lsd'

Fisher's least significant difference procedure on page 9-23

'scheffe'

Scheffé's S procedure on page 9-24

Example: 'CType','bonferroni' Display — Display toggle 'on' (default) | 'off' Display toggle, specified as the comma-separated pair consisting of 'Display' and either 'on' or 'off'. If you specify 'on', then multcompare displays a graph of the estimates and their comparison intervals. If you specify 'off', then multcompare omits the graph. Example: 'Display','off' Dimension — Dimension over which to calculate marginal means 1 (default) | positive integer value | vector of positive integer values A vector specifying the dimension or dimensions over which to calculate the population marginal means, specified as a positive integer value, or a vector of such values. Use the 'Dimension' namevalue pair only if you create the input structure stats using the function anovan. For example, if you specify 'Dimension' as 1, then multcompare compares the means for each value of the first grouping variable, adjusted by removing effects of the other grouping variables as if the design were balanced. If you specify 'Dimension'as [1,3], then multcompare computes the population marginal means for each combination of the first and third grouping variables, removing effects of the second grouping variable. If you fit a singular model, some cell means may not be estimable and any population marginal means that depend on those cell means will have the value NaN. Population marginal means are described by Milliken and Johnson (1992) and by Searle, Speed, and Milliken (1980). The idea behind population marginal means is to remove any effect of an unbalanced design by fixing the values of the factors specified by 'Dimension', and averaging out the effects of other factors as if each factor combination occurred the same number of times. The definition of population marginal means does not depend on the number of observations at each factor combination. For designed experiments where the number of observations at each factor combination has no meaning, population marginal means can be easier to interpret than simple means ignoring other factors. For surveys and other studies where the number of observations at each combination does have meaning, population marginal means may be harder to interpret. Example: 'Dimension',[1,3] Data Types: single | double Estimate — Estimates to be compared 'column' (default) | 'row' | 'slope' | 'intercept' | 'pmm' Estimates to be compared, specified as the comma-separated pair consisting of 'Estimate' and an allowable value. The allowable values for 'Estimate' depend on the function used to generate the input structure stats, according to the following table. 32-3480

multcompare

Source

Values

anova1

None. This name-value pair is ignored, and multcompare always compares the group means.

anova2

Either 'column' to compare column means, or 'row' to compare row means.

anovan

None. This name-value pair is ignored, and multcompare always compares the population marginal means as specified by the 'Dimension' name-value pair argument.

aoctool

Either 'slope', 'intercept', or 'pmm' to compare slopes, intercepts, or population marginal means, respectively. If the analysis of covariance model did not include separate slopes, then 'slope' is not allowed. If it did not include separate intercepts, then no comparisons are possible.

friedman

None. This name-value pair is ignored, and multcompare always compares the average column ranks.

kruskalwallis

None. This name-value pair is ignored, and multcompare always compares the average group ranks.

Example: 'Estimate','row'

Output Arguments c — Matrix of multiple comparison results matrix of scalar values Matrix of multiple comparison results, returned as an p-by-6 matrix of scalar values, where p is the number of pairs of groups. Each row of the matrix contains the result of one paired comparison test. Columns 1 and 2 contain the indices of the two samples being compared. Column 3 contains the lower confidence interval, column 4 contains the estimate, and column 5 contains the upper confidence interval. Column 6 contains the p-value for the hypothesis test that the corresponding mean difference is not equal to 0. For example, suppose one row contains the following entries. 2.0000

5.0000

1.9442

8.2206

14.4971 0.0432

These numbers indicate that the mean of group 2 minus the mean of group 5 is estimated to be 8.2206, and a 95% confidence interval for the true difference of the means is [1.9442, 14.4971]. The p-value for the corresponding hypothesis test that the difference of the means of groups 2 and 5 is significantly different from zero is 0.0432. In this example the confidence interval does not contain 0, so the difference is significant at the 5% significance level. If the confidence interval did contain 0, the difference would not be significant. The p-value of 0.0432 also indicates that the difference of the means of groups 2 and 5 is significantly different from 0. m — Matrix of estimates matrix of scalar values

32-3481

32

Functions

Matrix of the estimates, returned as a matrix of scalar values. The first column of m contains the estimated values of the means (or whatever statistics are being compared) for each group, and the second column contains their standard errors. h — Handle to the figure handle Handle to the figure containing the interactive graph, returned as a handle. The title of this graph contains instructions for interacting with the graph, and the x-axis label contains information about which means are significantly different from the selected mean. If you plan to use this graph for presentation, you may want to omit the title and the x-axis label. You can remove them using interactive features of the graph window, or you can use the following commands. title('') xlabel('')

gnames — Group names cell array of character vectors Group names, returned as a cell array of character vectors. Each row of gnames contains the name of a group.

More About Multiple Comparison Tests Analysis of variance compares the means of several groups to test the hypothesis that they are all equal, against the general alternative that they are not all equal. Sometimes this alternative may be too general. You may need information about which pairs of means are significantly different, and which are not. A multiple comparison test can provide this information. When you perform a simple t-test of one group mean against another, you specify a significance level that determines the cutoff value of the t-statistic. For example, you can specify the value alpha = 0.05 to insure that when there is no real difference, you will incorrectly find a significant difference no more than 5% of the time. When there are many group means, there are also many pairs to compare. If you applied an ordinary t-test in this situation, the alpha value would apply to each comparison, so the chance of incorrectly finding a significant difference would increase with the number of comparisons. Multiple comparison procedures are designed to provide an upper bound on the probability that any comparison will be incorrectly found significant.

References [1] Hochberg, Y., and A. C. Tamhane. Multiple Comparison Procedures. Hoboken, NJ: John Wiley & Sons, 1987. [2] Milliken, G. A., and D. E. Johnson. Analysis of Messy Data, Volume I: Designed Experiments. Boca Raton, FL: Chapman & Hall/CRC Press, 1992. [3] Searle, S. R., F. M. Speed, and G. A. Milliken. “Population marginal means in the linear model: an alternative to least-squares means.” American Statistician. 1980, pp. 216–221.

See Also anova1 | anova2 | anovan | aoctool | friedman | kruskalwallis 32-3482

multcompare

Introduced before R2006a

32-3483

32

Functions

multcompare Class: RepeatedMeasuresModel Multiple comparison of estimated marginal means

Syntax tbl = multcompare(rm,var) tbl = multcompare(rm,var,Name,Value)

Description tbl = multcompare(rm,var) returns multiple comparisons of the estimated marginal means based on the variable var in the repeated measures model rm. tbl = multcompare(rm,var,Name,Value) returns multiple comparisons of the estimated marginal means with additional options specified by one or more Name,Value pair arguments. For example, you can specify the comparison type or which variable to group by.

Input Arguments rm — Repeated measures model RepeatedMeasuresModel object Repeated measures model, returned as a RepeatedMeasuresModel object. For properties and methods of this object, see RepeatedMeasuresModel. var — Variables for which to compute marginal means character vector | string scalar Variables for which to compute the marginal means, specified as a character vector or string scalar representing the name of a between- or within-subjects factor in rm. If var is a between-subjects factor, it must be categorical. Data Types: char | string Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Alpha — Significance level 0.05 (default) | scalar value in the range of 0 through 1 Significance level of the confidence intervals for population marginal means, specified as the commaseparated pair consisting of 'alpha' and a scalar value in the range of 0 through 1. The confidence level is 100*(1–alpha)%. 32-3484

multcompare

Example: 'alpha',0.01 Data Types: double | single By — Factor to perform comparisons by character vector | string scalar Factor to do the comparisons by, specified as the comma-separated pair consisting of 'By' and a character vector or string scalar. The comparison between levels of var occurs separately for each value of the factor you specify. If you have more than one between-subjects factors, A, B, and C, and if you want to do the comparisons of A levels separately for each level of C, then specify A as the var argument and specify C using the 'By' argument as follows. Example: 'By',C Data Types: char | string ComparisonType — Type of critical value to use 'tukey-kramer' (default) | 'dunn-sidak' | 'bonferroni' | 'scheffe' | 'lsd' Type of critical value to use, specified as the comma-separated pair consisting of 'ComparisonType' and one of the following. Comparison Type

Definition

'tukey-kramer'

Default. Also called Tukey’s Honest Significant Difference procedure. It is based on the Studentized range distribution. According to the unproven Tukey-Kramer conjecture, it is also accurate for problems where the quantities being compared are correlated, as in analysis of covariance with unbalanced covariate values.

'dunn-sidak'

Use critical values from the t distribution, after an adjustment for multiple comparisons that was proposed by Dunn and proved accurate by Sidák. The critical value is t =

yi − y j MSE

1 ni

+

1 nj

> t1 − η/2, v,

where 1

η=1− 1−α

k 2

and ng is the number of groups (marginal means). This procedure is similar to, but less conservative than, the Bonferroni procedure.

32-3485

32

Functions

Comparison Type

Definition

'bonferroni'

Use critical values from the t distribution, after a Bonferroni adjustment to compensate for multiple comparisons. The critical value is tα/2 ng , v, 2

where ng is the number of groups (marginal means), and v is the error degrees of freedom. This procedure is conservative, but usually less so than the Scheffé procedure. 'scheffe'

Use critical values from Scheffé's S procedure, derived from the F distribution. The critical value is ng − 1 Fα, ng − 1, v, where ng is the number of groups (marginal means), and v is the error degrees of freedom. This procedure provides a simultaneous confidence level for comparisons of all linear combinations of the means, and it is conservative for comparisons of simple differences of pairs.

'lsd'

Least significant difference. This option uses plain t-tests. The critical value is tα/2, v, where v is the error degrees of freedom. It provides no protection against the multiple comparison problem.

Example: 'ComparisonType','dunn-sidak'

Output Arguments tbl — Results of multiple comparison table Results of multiple comparisons of estimated marginal means, returned as a table. tbl has the following columns.

32-3486

Column Name

Description

Difference

Estimated difference between the corresponding two marginal means

StdErr

Standard error of the estimated difference between the corresponding two marginal means

pValue

p-value for a test that the difference between the corresponding two marginal means is 0

Lower

Lower limit of simultaneous 95% confidence intervals for the true difference

multcompare

Column Name

Description

Upper

Upper limit of simultaneous 95% confidence intervals for the true difference

Examples Multiple Comparison of Estimated Marginal Means Load the sample data. load fisheriris

The column vector species consists of iris flowers of three different species: setosa, versicolor, and virginica. The double matrix meas consists of four types of measurements on the flowers: the length and width of sepals and petals in centimeters, respectively. Store the data in a table array. t = table(species,meas(:,1),meas(:,2),meas(:,3),meas(:,4),... 'VariableNames',{'species','meas1','meas2','meas3','meas4'}); Meas = dataset([1 2 3 4]','VarNames',{'Measurements'});

Fit a repeated measures model, where the measurements are the responses and the species is the predictor variable. rm = fitrm(t,'meas1-meas4~species','WithinDesign',Meas);

Perform a multiple comparison of the estimated marginal means of species. tbl = multcompare(rm,'species') tbl=6×7 table species_1 ______________

species_2 ______________

{'setosa' } {'setosa' } {'versicolor'} {'versicolor'} {'virginica' } {'virginica' }

{'versicolor'} {'virginica' } {'setosa' } {'virginica' } {'setosa' } {'versicolor'}

Difference __________ -1.0375 -1.7495 1.0375 -0.712 1.7495 0.712

StdErr ________

pValue __________

Lower ________

Uppe _____

0.060539 0.060539 0.060539 0.060539 0.060539 0.060539

9.5606e-10 9.5606e-10 9.5606e-10 9.5606e-10 9.5606e-10 9.5606e-10

-1.1794 -1.8914 0.89562 -0.85388 1.6076 0.57012

-0.89 -1.6 1.1 -0.57 1.8 0.85

The small p-values (in the pValue field) indicate that the estimated marginal means for the three species significantly differ from each other.

Perform Multiple Comparisons with Specified Options Load the sample data. load repeatedmeas

32-3487

32

Functions

The table between includes the between-subject variables age, IQ, group, gender, and eight repeated measures y1 through y8 as responses. The table within includes the within-subject variables w1 and w2. This is simulated data. Fit a repeated measures model, where the repeated measures y1 through y8 are the responses, and age, IQ, group, gender, and the group-gender interaction are the predictor variables. Also specify the within-subject design matrix. R = fitrm(between,'y1-y8 ~ Group*Gender + Age + IQ','WithinDesign',within);

Perform a multiple comparison of the estimated marginal means based on the variable Group. T = multcompare(R,'Group') T=6×7 table Group_1 _______ A A B B C C

Group_2 _______ B C A C A B

Difference __________ 4.9875 23.094 -4.9875 18.107 -23.094 -18.107

StdErr ______

pValue _________

Lower _______

Upper _______

5.6271 5.9261 5.6271 5.8223 5.9261 5.8223

0.65436 0.0021493 0.65436 0.013588 0.0021493 0.013588

-9.1482 8.2074 -19.123 3.4805 -37.981 -32.732

19.123 37.981 9.1482 32.732 -8.2074 -3.4805

The small p-value of 0.0021493 indicates that there is significant difference between the marginal means of groups A and C. The p-value of 0.65436 indicates that the difference between the marginal means for groups A and B is not significantly different from 0. multcompare uses the Tukey-Kramer test statistic by default. Change the comparison type to the Scheffe procedure. T = multcompare(R,'Group','ComparisonType','Scheffe') T=6×7 table Group_1 _______ A A B B C C

Group_2 _______ B C A C A B

Difference __________ 4.9875 23.094 -4.9875 18.107 -23.094 -18.107

StdErr ______

pValue _________

Lower _______

Upper _______

5.6271 5.9261 5.6271 5.8223 5.9261 5.8223

0.67981 0.0031072 0.67981 0.018169 0.0031072 0.018169

-9.7795 7.5426 -19.755 2.8273 -38.646 -33.386

19.755 38.646 9.7795 33.386 -7.5426 -2.8273

The Scheffe test produces larger p-values, but similar conclusions. Perform multiple comparisons of estimated marginal means based on the variable Group for each gender separately. T = multcompare(R,'Group','By','Gender') T=12×8 table Gender ______

32-3488

Group_1 _______

Group_2 _______

Difference __________

StdErr ______

pValue ________

Lower _________

Upper __________

multcompare

Female Female Female Female Female Female Male Male Male Male Male Male

A A B B C C A A B B C C

B C A C A B B C A C A B

4.1883 24.565 -4.1883 20.376 -24.565 -20.376 5.7868 21.624 -5.7868 15.837 -21.624 -15.837

8.0177 8.2083 8.0177 8.1101 8.2083 8.1101 7.9498 8.1829 7.9498 8.0511 8.1829 8.0511

0.86128 0.017697 0.86128 0.049957 0.017697 0.049957 0.74977 0.038022 0.74977 0.14414 0.038022 0.14414

-15.953 3.9449 -24.329 0.0033459 -45.184 -40.749 -14.183 1.0676 -25.757 -4.3881 -42.179 -36.062

24.329 45.184 15.953 40.749 -3.9449 -0.0033459 25.757 42.179 14.183 36.062 -1.0676 4.3881

The results indicate that the difference between marginal means for groups A and B is not significant from 0 for either gender (corresponding p-values are 0.86128 for females and 0.74977 for males). The difference between marginal means for groups A and C is significant for both genders (corresponding p-values are 0.017697 for females and 0.038022 for males). While the difference between marginal means for groups B and C is significantly different from 0 for females (p-value is 0.049957), it is not significantly different from 0 for males (p-value is 0.14414).

References [1] G. A. Milliken, and Johnson, D. E. Analysis of Messy Data. Volume I: Designed Experiments. New York, NY: Chapman & Hall, 1992.

See Also fitrm | margmean | plotprofile

32-3489

32

Functions

multivarichart Multivari chart for grouped data

Syntax multivarichart(y,GROUP) multivarichart(Y) multivarichart(...,param1,val1,param2,val2,...) [charthandle,AXESH] = multivarichart(...)

Description multivarichart(y,GROUP) displays the multivari chart for the vector y grouped by entries in GROUP that can be a cell array or a matrix. If GROUP is a cell array, then each cell in GROUP must contain a grouping variable that is a categorical vector, numeric vector, character matrix, string array, or single-column cell array of character vectors. If GROUP is a numeric matrix, then its columns represent different grouping variables. Each grouping variable must have the same number of elements as y. The number of grouping variables must be 2, 3, or 4. Each subplot of the plot matrix contains a multivari chart for the first and second grouping variables. The x-axis in each subplot indicates values of the first grouping variable. The legend at the bottom of the figure window indicates values of the second grouping variable. The subplot at position (i,j) is the multivari chart for the subset of y at the ith level of the third grouping variable and the jth level of the fourth grouping variable. If the third or fourth grouping variable is absent, it is considered to have only one level. multivarichart(Y) displays the multivari chart for a matrix Y. The data in different columns represent changes in one factor. The data in different rows represent changes in another factor. multivarichart(...,param1,val1,param2,val2,...) specifies one or more of the following name/value pairs: • 'varnames' — Grouping variable names in a character matrix, a string array, or a cell array of character vectors, one per grouping variable. Default names are 'X1', 'X2', ... . • 'plotorder' — 'sorted' or a vector containing a permutation of the integers from 1 to the number of grouping variables. If 'plotorder' is 'sorted', the grouping variables are rearranged in descending order according to the number of levels in each variable. If 'plotorder' is a vector, it indicates the order in which each grouping variable should be plotted. For example, [2,3,1,4] indicates that the second grouping variable should be used as the x-axis of each subplot, the third grouping variable should be used as the legend, the first grouping variable should be used as the columns of the plot, and the fourth grouping variable should be used as the rows of the plot. [charthandle,AXESH] = multivarichart(...) returns a handle charthandle to the figure window and a matrix AXESH of handles to the subplot axes. 32-3490

multivarichart

Examples Multivari Chart for Grouped Data Display a multivari chart for data with two grouping variables. rng default; % For reproducibility y = randn(100,1); % Randomly generate response group = [ceil(3*rand(100,1)) ceil(2*rand(100,1))]; multivarichart(y,group)

Display a multivari chart for data with four grouping variables. y = randn(1000,1); % Randomly generate response group = {ceil(2*rand(1000,1)),ceil(3*rand(1000,1)), ... ceil(2*rand(1000,1)),ceil(3*rand(1000,1))}; multivarichart(y,group)

32-3491

32

Functions

See Also interactionplot | maineffectsplot Topics “Grouping Variables” on page 2-45 Introduced in R2006b

32-3492

mvksdensity

mvksdensity Kernel smoothing function estimate for multivariate data

Syntax f = mvksdensity(x,pts,'Bandwidth',bw) f = mvksdensity(x,pts,'Bandwidth',bw,Name,Value)

Description f = mvksdensity(x,pts,'Bandwidth',bw) computes a probability density estimate of the sample data in the n-by-d matrix x, evaluated at the points in pts using the required name-value pair argument value bw for the bandwidth value. The estimation is based on a product Gaussian kernel function. For univariate or bivariate data, use ksdensity instead. f = mvksdensity(x,pts,'Bandwidth',bw,Name,Value) returns any of the previous output arguments, using additional options specified by one or more Name,Value pair arguments. For example, you can define the function type that mvksdensity evaluates, such as probability density, cumulative probability, or survivor function. You can also assign weights to the input values.

Examples Estimate Multivariate Kernel Density Load the Hald cement data. load hald

The data measures the heat of hardening for 13 different cement compositions. The predictor matrix ingredients contains the percent composition for each of four cement ingredients. The response matrix heat contains the heat of hardening (in cal\g) after 180 days. Estimate the kernel density for the first three observations in ingredients. xi = ingredients(1:3,:); f = mvksdensity(ingredients,xi,'Bandwidth',0.8);

Estimate Multivariate Kernel Density Using Grids Load the Hald cement data. load hald

The data measures the heat of hardening for 13 different cement compositions. The predictor matrix ingredients contains the percent composition for each of four cement ingredients. The response matrix heat contains the heat of hardening (in cal/g) after 180 days. 32-3493

32

Functions

Create a array of points at which to estimate the density. First, define the range and spacing for each variable, using a similar number of points in each dimension. gridx1 gridx2 gridx3 gridx4

= = = =

0:2:22; 20:5:80; 0:2:24; 5:5:65;

Next, use ndgrid to generate a full grid of points using the defined range and spacing. [x1,x2,x3,x4] = ndgrid(gridx1,gridx2,gridx3,gridx4);

Finally, transform and concatenate to create an array that contains the points at which to estimate the density. This array has one column for each variable. x1 x2 x3 x4 xi

= = = = =

x1(:,:)'; x2(:,:)'; x3(:,:)'; x4(:,:)'; [x1(:) x2(:) x3(:) x4(:)];

Estimate the density. f = mvksdensity(ingredients,xi,... 'Bandwidth',[4.0579 10.7345 4.4185 11.5466],... 'Kernel','normpdf');

View the size of xi and f to confirm that mvksdensity calculates the density at each point in xi. size_xi = size(xi) size_xi = 1×2 26364

4

size_f = size(f) size_f = 1×2 26364

1

Input Arguments x — Sample data numeric matrix Sample data for which mvksdensity returns the probability density estimate, specified as an n-by-d matrix of numeric values. n is the number of data points (rows) in x, and d is the number of dimensions (columns). Data Types: single | double pts — Points at which to evaluate f matrix 32-3494

mvksdensity

Points at which to evaluate the probability density estimate f, specified as a matrix with the same number of columns as x. The returned estimate f and pts have the same number of rows. Data Types: single | double bw — Value for the bandwidth of the kernel smoothing window scalar value | d-element vector Value for the bandwidth of the kernel-smoothing window, specified as a scalar value or d-element vector. d is the number of dimensions (columns) in the sample data x. If bw is a scalar value, it applies to all dimensions. If you specify 'BoundaryCorrection' as 'log'(default) and 'Support' as either 'positive' or a two-row matrix, mvksdensity converts bounded data to be unbounded by using log transformation. The value of bw is on the scale of the transformed values. Silverman's rule of thumb for the bandwidth is bi = σi

4 d+2 n

1 d+4

,

i = 1, 2, ..., d,

where d is the number of dimensions, n is the number of observations, and σi is the standard deviation of the ith variate [4]. Example: 'Bandwidth',0.8 Data Types: single | double Name-Value Pair Arguments Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN. Example: 'Kernel','triangle','Function,'cdf' specifies that mvksdensity estimates the cdf of the sample data using the triangle kernel function. BoundaryCorrection — Boundary correction method 'log' (default) | 'reflection' Boundary correction method, specified as the comma-separated pair consisting of 'BoundaryCorrection' and either 'log' or 'reflection'.

32-3495

32

Functions

Value

Description

'log'

mvksdensity converts bounded data to be unbounded by using one of the following transformations. Then, it transforms back to the original bounded scale after density estimation. • If you specify 'Support','positive', then mvksdensity applies log(xj) for each dimension, where xj is the jth column of the input argument x. • If you specify 'Support' as a two-row matrix consisting of the lower and upper limits for each dimension, then mvksdensity applies log((xj-Lj)/(Ujxj)) for each dimension, where Lj and Uj are the lower and upper limits of the jth dimension, respectively. The value of bw is on the scale of the transformed values.

'reflection'

mvksdensity augments bounded data by adding reflected data near the boundaries, then it returns estimates corresponding to the original support. For details, see “Reflection Method” on page 32-3498.

mvksdensity applies boundary correction only when you specify 'Support' as a value other than 'unbounded'. Example: 'BoundaryCorrection','reflection' Function — Function to estimate 'pdf' (default) | 'cdf' | 'survivor' Function to estimate, specified as the comma-separated pair consisting of 'Function' and one of the following. Value

Description

'pdf'

Probability density function

'cdf'

Cumulative distribution function

'survivor'

Survivor function

Example: 'Function','cdf' Kernel — Type of kernel smoother 'normal' (default) | 'box' | 'triangle' | 'epanechnikov' | function handle | character vector | string scalar Type of kernel smoother, specified as the comma-separated pair consisting of 'Kernel' and one of the following. Value

Description

'normal'

Normal (Gaussian) kernel

'box'

Box kernel

'triangle'

Triangular kernel

'epanechnikov'

Epanechnikov kernel

You can also specify a kernel function that is a custom or built-in function. Specify the function as a function handle (for example, @myfunction or @normpdf) or as a character vector or string scalar 32-3496

mvksdensity

(for example, 'myfunction' or 'normpdf'). The software calls the specified function with one argument that is an array of distances between data values and locations where the density is evaluated, normalized by the bandwidth in that dimension. The function must return an array of the same size containing the corresponding values of the kernel function. mvksdensity applies the same kernel to each dimension. Example: 'Kernel','box' Support — Support for the density 'unbounded' (default) | 'positive' | 2-by-d matrix Support for the density, specified as the comma-separated pair consisting of 'support' and one of the following. Value

Description

'unbounded'

Allow the density to extend over the whole real line

'positive'

Restrict the density to positive values

2-by-d matrix

Specify the finite lower and upper bounds for the support of the density. The first row contains the lower limits and the second row contains the upper limits. Each column contains the limits for one dimension of x.

'Support' can also be a combination of positive, unbounded, and bounded variables specified as [0 -Inf L; Inf Inf U]. Example: 'Support','positive' Data Types: single | double | char | string Weights — Weights for sample data vector Weights for sample data, specified as the comma-separated pair consisting of 'Weights' and a vector of length size(x,1), where x is the sample data. Example: 'Weights',xw Data Types: single | double

Output Arguments f — Estimated function values vector Estimated function values, returned as a vector. f and pts have the same number of rows.

More About Multivariate Kernel Distribution A multivariate kernel distribution is a nonparametric representation of the probability density function (pdf) of a random vector. You can use a kernel distribution when a parametric distribution 32-3497

32

Functions

cannot properly describe the data, or when you want to avoid making assumptions about the distribution of the data. A multivariate kernel distribution is defined by a smoothing function and a bandwidth matrix, which control the smoothness of the resulting density curve. The multivariate kernel density estimator is the estimated pdf of a random vector. Let x = (x1, x2, …, xd)' be a d-dimensional random vector with a density function f and let yi = (yi1, yi2, …, yid)' be a random sample drawn from f for i = 1, 2, …, n, where n is the number of random samples. For any real vectors of x, the multivariate kernel density estimator is given by f

H

x =

n

1 KH x − yi , ni∑ =1

where KH x = H bandwidth matrix.

−1/2

K(H−1/2x), K · is the kernel smoothing function, and H is the d-by-d

mvksdensity uses a diagonal bandwidth matrix and a product kernel. That is, H1/2 is a square diagonal matrix with the elements of vector (h1, h2, …, hd) on the main diagonal. K(x) takes the product form K(x) = k(x1)k(x2) ⋯k(xd), where k · is a one-dimensional kernel smoothing function. Then, the multivariate kernel density estimator becomes f =

H

x =

n n x1 − yi1 x2 − yi2 xd − yid 1 1 K , , ⋯, KH x − yi = ∑ ∑ ni = 1 h1 h2 hd nh1h2⋯hd i = 1

n d x j − yi j 1 k . ∑ ∏ hj nh1h2⋯hd i = 1 j = 1

The kernel estimator for the cumulative distribution function (cdf), for any real vectors of x, is given by FH x =

x1

x2

xd

−∞ −∞

−∞

∫ ∫ ⋯∫

where G(x j) =



xj

−∞

f

H(t)dtd⋯dt2dt1

=

n d x j − yi j 1 , G ∑ ∏ hj ni = 1 j = 1

k(t j)dt j.

Reflection Method The reflection method is a boundary correction method that accurately finds kernel density estimators when a random variable has bounded support. If you specify 'BoundaryCorrection','reflection', mvksdensity uses the reflection method. If you additionally specify 'Support' as a two-row matrix consisting of the lower and upper limits for each dimension, then mvksdensity finds the kernel estimator as follows. • If 'Function' is 'pdf', then the kernel density estimator is f

H

x =

n d x j − yi+j x j − yi−j x j − yi j 1 k +k +k ∑ ∏ hj hj hj nh1h2⋯hd i = 1 j = 1

for Lj ≤ xj ≤ Uj,

where yi−j = 2L j − yi j, yi+j = 2U j − yi j, and yij is the jth element of the ith sample data corresponding to x(i,j) of the input argument x. Lj and Uj are the lower and upper limits of the jth dimension, respectively. 32-3498

mvksdensity

• If 'Function' is 'cdf', then the kernel estimator for cdf is

F H(x) =

n d x j − yi+j L j − yi+j x j − yi−j x j − yi j L j − yi−j L j − yi j 1 G +G +G −G −G −G ∑ ∏ ni = 1 j = 1 hj hj hj hj hj hj

for Lj ≤ xj ≤ Uj. • To obtain a kernel estimator for a survivor function (when 'Function' is 'survivor'), mvksdensity uses both f

H(x)

and F H(x).

If you additionally specify 'Support' as 'positive' or a matrix including [0 inf], then mvksdensity finds the kernel density estimator by replacing [Lj Uj] with [0 inf] in the above equations.

References [1] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press Inc., 1997. [2] Hill, P. D. “Kernel estimation of a distribution function.” Communications in Statistics – Theory and Methods. Vol. 14, Issue 3, 1985, pp. 605-620. [3] Jones, M. C. “Simple boundary correction for kernel density estimation.” Statistics and Computing. Vol. 3, Issue 3, 1993, pp. 135-146. [4] Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986. [5] Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, 2015.

Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. Usage notes and limitations: • Names in name-value pair arguments, including 'Bandwidth', must be compile-time constants. • Values in the following name-value pair arguments must also be compile-time constants: 'BoundaryCorrection', 'Function', and 'Kernel'. For example, to use the 'Function','cdf' name-value pair argument in the generated code, include {coder.Constant('Function'),coder.Constant('cdf')} in the -args value of codegen. • The value of the 'Kernel' name-value pair argument cannot be a custom function handle. To specify a custom kernel function, use a character vector or string scalar. • For the value of the 'Support' name-value pair argument, the compile-time data type must match the runtime data type. For more information on code generation, see “Introduction to Code Generation” on page 31-2 and “General Code Generation Workflow” on page 31-5. 32-3499

32

Functions

See Also ksdensity Topics “Working with Probability Distributions” on page 5-2 “Nonparametric and Empirical Probability Distributions” on page 5-29 “Supported Distributions” on page 5-13 Introduced in R2016a

32-3500

mvncdf

mvncdf Multivariate normal cumulative distribution function

Syntax p = mvncdf(X) p = mvncdf(X,mu,sigma) p = mvncdf(xl,xu,mu,sigma) p = mvncdf( ___ ,options) [p,err] = mvncdf( ___ )

Description p = mvncdf(X) returns the cumulative distribution function (cdf) of the multivariate normal distribution with zero mean and identity covariance matrix, evaluated at each row of X. For more information, see “Multivariate Normal Distribution” on page 32-3507. p = mvncdf(X,mu,sigma) returns the cdf of the multivariate normal distribution with mean mu and covariance sigma, evaluated at each row of X. Specify [] for mu to use its default value of zero when you want to specify only sigma. p = mvncdf(xl,xu,mu,sigma) returns the multivariate normal cdf evaluated over the multidimensional rectangle with lower and upper limits defined by xl and xu, respectively. p = mvncdf( ___ ,options) specifies control parameters for the numerical integration used to compute p, using any of the input argument combinations in the previous syntaxes. Create the options argument using the statset function with any combination of the parameters 'TolFun', 'MaxFunEvals', and 'Display'. [p,err] = mvncdf( ___ ) additionally returns an estimate of the error in p. For more information, see “Algorithms” on page 32-3507.

Examples Standard Multivariate Normal Distribution cdf Evaluate the cdf of a standard four-dimensional multivariate normal distribution at points with increasing coordinates in every dimension. Create a matrix X of five four-dimensional points with increasing coordinates. firstDim = (-2:2)'; X = repmat(firstDim,1,4) X = 5×4

32-3501

32

Functions

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

-2 -1 0 1 2

Evaluate the cdf at the points in X. p = mvncdf(X) p = 5×1 0.0000 0.0006 0.0625 0.5011 0.9121

The cdf values increase because the coordinates of the points are increasing in every dimension.

Bivariate Normal Distribution cdf Compute and plot the cdf of a bivariate normal distribution. Define the mean vector mu and the covariance matrix sigma. mu = [1 -1]; sigma = [.9 .4; .4 .3];

Create a grid of 625 evenly spaced points in two-dimensional space. [X1,X2] = meshgrid(linspace(-1,3,25)',linspace(-3,1,25)'); X = [X1(:) X2(:)];

Evaluate the cdf of the normal distribution at the grid points. p = mvncdf(X,mu,sigma);

Plot the cdf values. Z = reshape(p,25,25); surf(X1,X2,Z)

32-3502

mvncdf

Probability over Rectangular Region Compute the probability over the unit square of a bivariate normal distribution, and create a contour plot of the results. Define the bivariate normal distribution parameters mu and sigma. mu = [0 0]; sigma = [0.25 0.3; 0.3 1];

Compute the probability over the unit square. p = mvncdf([0 0],[1 1],mu,sigma) p = 0.2097

To visualize the result, first create a grid of evenly spaced points in two-dimensional space. x1 = -3:.2:3; x2 = -3:.2:3; [X1,X2] = meshgrid(x1,x2); X = [X1(:) X2(:)];

Then, evaluate the pdf of the normal distribution at the grid points. 32-3503

32

Functions

y = mvnpdf(X,mu,sigma); y = reshape(y,length(x2),length(x1));

Finally, create a contour plot of the multivariate normal distribution that includes the unit square. contour(x1,x2,y,[0.0001 0.001 0.01 0.05 0.15 0.25 0.35]) xlabel('x') ylabel('y') line([0 0 1 1 0],[1 0 0 1 1],'Linestyle','--','Color','k')

Computing a multivariate cumulative probability requires significantly more work than computing a univariate probability. By default, the mvncdf function computes values to less than full machine precision, and returns an estimate of the error as an optional second output. View the error estimate in this case. [p,err] = mvncdf([0 0],[1 1],mu,sigma) p = 0.2097 err = 1.0000e-08

Error Estimates in cdf Calculation Evaluate the cdf of a multivariate normal distribution at random points, and display the error estimates associated with the cdf calculation. 32-3504

mvncdf

Generate four random points from a five-dimensional multivariate normal distribution with mean vector mu and covariance matrix sigma. mu = [0.5 -0.3 0.2 0.1 -0.4]; sigma = 0.5*eye(5); rng('default') % For reproducibility X = mvnrnd(mu,sigma,4);

Find the cdf values p at the points in X and the associated error estimates err. Display a summary of the numerical calculations. [p,err] = mvncdf(X,mu,sigma,statset('Display','final')) Successfully Successfully Successfully Successfully

satisfied satisfied satisfied satisfied

error error error error

tolerance tolerance tolerance tolerance

of of of of

0.0001 0.0001 0.0001 0.0001

in in in in

8650 8650 8650 8650

function function function function

evaluations. evaluations. evaluations. evaluations.

p = 4×1 0.1520 0.0407 0.0002 0.1970 err = 4×1 10-16 × 0 0.0496 0.0012 0.5949

Input Arguments X — Evaluation points numeric matrix Evaluation points, specified as an n-by-d numeric matrix, where n is a positive scalar integer and d is the dimension of a single multivariate normal distribution. The rows of X correspond to observations (or points), and the columns correspond to variables (or coordinates). Data Types: single | double mu — Mean vector of multivariate normal distribution vector of zeros (default) | numeric vector | numeric scalar Mean vector of a multivariate normal distribution, specified as a 1-by-d numeric vector or a numeric scalar, where d is the dimension of the multivariate normal distribution. If mu is a scalar, then mvncdf replicates the scalar to match the size of X. Data Types: single | double sigma — Covariance matrix of multivariate normal distribution identity matrix (default) | symmetric, positive definite matrix | numeric vector of diagonal entries 32-3505

32

Functions

Covariance matrix of a multivariate normal distribution, specified as a d-by-d symmetric, positive definite matrix, where d is the dimension of the multivariate normal distribution. If the covariance matrix is diagonal, containing variances along the diagonal and zero covariances off it, then you can also specify sigma as a 1-by-d vector containing just the diagonal entries. Data Types: single | double xl — Rectangle lower limit numeric vector Rectangle lower limit, specified as a 1-by-d numeric vector. Data Types: single | double xu — Rectangle upper limit numeric vector Rectangle upper limit, specified as a 1-by-d numeric vector. Data Types: single | double options — Numerical integration options structure Numerical integration options, specified as a structure. Create the options argument by calling the statset function with any combination of the following parameters: • 'TolFun' — Maximum absolute error tolerance. The default value is 1e-8 when d < 4, and 1e-4 when d ≥ 4. • 'MaxFunEvals' — Maximum number of integrand evaluations allowed when d ≥ 4. The default value is 1e7. The function ignores 'MaxFunEvals' when d < 4. • 'Display' — Level of display output. The choices are 'off' (the default), 'iter', and 'final'. The function ignores 'Display' when d < 4. Example: statset('TolFun',1e-7,'Display','final') Data Types: struct

Output Arguments p — cdf values numeric vector | numeric scalar cdf values, returned as either an n-by-1 numeric vector, where n is the number of rows in X, or a numeric scalar representing the probability over the rectangular region specified by xl and xu. err — Absolute error tolerance positive numeric scalar Absolute error tolerance, returned as a positive numeric scalar. For bivariate and trivariate distributions, the default absolute error tolerance is 1e-8. For four or more dimensions, the default absolute error tolerance is 1e-4. For more information, see “Algorithms” on page 32-3507.

32-3506

mvncdf

More About Multivariate Normal Distribution The multivariate normal distribution is a generalization of the univariate normal distribution to two or more variables. It has two parameters, a mean vector μ and a covariance matrix Σ, that are analogous to the mean and variance parameters of a univariate normal distribution. The diagonal elements of Σ contain the variances for each variable, and the off-diagonal elements of Σ contain the covariances between variables. The probability density function (pdf) of the d-dimensional multivariate normal distribution is y = f (x, μ, Σ) = 

1

1 exp − (x‐μ) Σ‐1(x‐μ)' 2 Σ (2π) d

where x and μ are 1-by-d vectors and Σ is a d-by-d symmetric, positive definite matrix. Only mvnrnd allows positive semi-definite Σ matrices, which can be singular. The pdf cannot have the same form when Σ is singular. The multivariate normal cumulative distribution function (cdf) evaluated at x is the probability that a random vector v, distributed as multivariate normal, lies within the semi-infinite rectangle with upper limits defined by x: Pr v(1) ≤ x(1), v(2) ≤ x(2), ..., v(d) ≤ x(d) . Although the multivariate normal cdf does not have a closed form, mvncdf can compute cdf values numerically.

Tips • In the one-dimensional case, sigma is the variance, not the standard deviation. For example, mvncdf(1,0,4) is the same as normcdf(1,0,2), where 4 is the variance and 2 is the standard deviation.

Algorithms For bivariate and trivariate distributions, mvncdf uses adaptive quadrature on a transformation of the t density, based on methods developed by Drezner and Wesolowsky [1] [2] and by Genz [3]. For four or more dimensions, mvncdf uses a quasi-Monte Carlo integration algorithm based on methods developed by Genz and Bretz [4] [5].

References [1] Drezner, Z. “Computation of the Trivariate Normal Integral.” Mathematics of Computation. Vol. 63, 1994, pp. 289–294. [2] Drezner, Z., and G. O. Wesolowsky. “On the Computation of the Bivariate Normal Integral.” Journal of Statistical Computation and Simulation. Vol. 35, 1989, pp. 101–107. [3] Genz, A. “Numerical Computation of Rectangular Bivariate and Trivariate Normal and t Probabilities.” Statistics and Computing. Vol. 14, No. 3, 2004, pp. 251–260. 32-3507

32

Functions

[4] Genz, A., and F. Bretz. “Numerical Computation of Multivariate t Probabilities with Application to Power Calculation of Multiple Contrasts.” Journal of Statistical Computation and Simulation. Vol. 63, 1999, pp. 361–378. [5] Genz, A., and F. Bretz. “Comparison of Methods for the Computation of Multivariate t Probabilities.” Journal of Computational and Graphical Statistics. Vol. 11, No. 4, 2002, pp. 950–971. [6] Kotz, S., N. Balakrishnan, and N. L. Johnson. Continuous Multivariate Distributions: Volume 1: Models and Applications. 2nd ed. New York: John Wiley & Sons, Inc., 2000.

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also mvnpdf | mvnrnd Topics “Multivariate Normal Distribution” on page B-98 Introduced in R2006a

32-3508

mvnpdf

mvnpdf Multivariate normal probability density function

Syntax y = mvnpdf(X) y = mvnpdf(X,mu) y = mvnpdf(X,mu,sigma)

Description y = mvnpdf(X) returns an n-by-1 vector y containing the probability density function (pdf) of the ddimensional multivariate normal distribution with zero mean and identity covariance matrix, evaluated at each row of the n-by-d matrix X. For more information, see “Multivariate Normal Distribution” on page 32-3515. y = mvnpdf(X,mu) returns pdf values of points in X, where mu determines the mean of each associated multivariate normal distribution. y = mvnpdf(X,mu,sigma) returns pdf values of points in X, where sigma determines the covariance of each associated multivariate normal distribution. Specify [] for mu to use its default value of zero when you want to specify only sigma.

Examples Standard Multivariate Normal pdf Evaluate the pdf of a standard five-dimensional normal distribution at a set of random points. Randomly sample eight points from the standard five-dimensional normal distribution. mu = zeros(1,5); sigma = eye(5); rng('default') % For reproducibility X = mvnrnd(mu,sigma,8) X = 8×5 0.5377 1.8339 -2.2588 0.8622 0.3188 -1.3077 -0.4336 0.3426

3.5784 2.7694 -1.3499 3.0349 0.7254 -0.0631 0.7147 -0.2050

-0.1241 1.4897 1.4090 1.4172 0.6715 -1.2075 0.7172 1.6302

0.4889 1.0347 0.7269 -0.3034 0.2939 -0.7873 0.8884 -1.1471

-1.0689 -0.8095 -2.9443 1.4384 0.3252 -0.7549 1.3703 -1.7115

Evaluate the pdf of the distribution at the points in X. 32-3509

32

Functions

y = mvnpdf(X) y = 8×1 0.0000 0.0000 0.0000 0.0000 0.0054 0.0011 0.0015 0.0003

Find the point in X with the greatest pdf value. [maxpdf,idx] = max(y) maxpdf = 0.0054 idx = 5 maxPoint = X(idx,:) maxPoint = 1×5 0.3188

0.7254

0.6715

0.2939

0.3252

The fifth point in X has a greater pdf value than any of the other randomly selected points.

Multivariate Normal pdfs Evaluated at Different Points Create six three-dimensional normal distributions, each with a distinct mean. Evaluate the pdf of each distribution at a different random point. Specify the means mu and covariances sigma of the distributions. Each distribution has the same covariance matrix—the identity matrix. firstDim = (1:6)'; mu = repmat(firstDim,1,3) mu = 6×3 1 2 3 4 5 6

1 2 3 4 5 6

sigma = eye(3) sigma = 3×3

32-3510

1 2 3 4 5 6

mvnpdf

1 0 0

0 1 0

0 0 1

Randomly sample once from each of the six distributions. rng('default') % For reproducibility X = mvnrnd(mu,sigma) X = 6×3 1.5377 3.8339 0.7412 4.8622 5.3188 4.6923

0.5664 2.3426 6.5784 6.7694 3.6501 9.0349

1.7254 1.9369 3.7147 3.7950 4.8759 7.4897

Evaluate the pdfs of the distributions at the points in X. The pdf of the first distribution is evaluated at the point X(1,:), the pdf of the second distribution is evaluated at the point X(2,:), and so on. y = mvnpdf(X,mu) y = 6×1 0.0384 0.0111 0.0000 0.0009 0.0241 0.0001

Multivariate Normal pdf Evaluate the pdf of a two-dimensional normal distribution at a set of given points. Specify the mean mu and covariance sigma of the distribution. mu = [1 -1]; sigma = [0.9 0.4; 0.4 0.3];

Randomly sample from the distribution 100 times. Specify X as the matrix of sampled points. rng('default') % For reproducibility X = mvnrnd(mu,sigma,100);

Evaluate the pdf of the distribution at the points in X. y = mvnpdf(X,mu,sigma);

Plot the probability density values. scatter3(X(:,1),X(:,2),y) xlabel('X1')

32-3511

32

Functions

ylabel('X2') zlabel('Probability Density')

Multivariate Normal pdfs Evaluated at Same Point Create ten different five-dimensional normal distributions, and compare the values of their pdfs at a specified point. Set the dimensions n and d equal to 10 and 5, respectively. n = 10; d = 5;

Specify the means mu and the covariances sigma of the multivariate normal distributions. Let all the distributions have the same mean vector, but vary the covariance matrices. mu = ones(1,d) mu = 1×5 1

1

1

1

mat = eye(d); nMat = repmat(mat,1,1,n);

32-3512

1

mvnpdf

var = reshape(1:n,1,1,n); sigma = nMat.*var;

Display the first two covariance matrices in sigma. sigma(:,:,1:2) ans = ans(:,:,1) = 1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 0 2 0 0

0 0 0 2 0

0 0 0 0 2

ans(:,:,2) = 2 0 0 0 0

0 2 0 0 0

Set x to be a random point in five-dimensional space. rng('default') % For reproducibility x = normrnd(0,1,1,5) x = 1×5 0.5377

1.8339

-2.2588

0.8622

0.3188

Evaluate the pdf at x for each of the ten distributions. y = mvnpdf(x,mu,sigma) y = 10×1 10-4 × 0.2490 0.8867 0.8755 0.7035 0.5438 0.4211 0.3305 0.2635 0.2134 0.1753

Plot the results. scatter(1:n,y,'filled') xlabel('Distribution Index') ylabel('Probability Density at x')

32-3513

32

Functions

Input Arguments X — Evaluation points numeric vector | numeric matrix Evaluation points, specified as a 1-by-d numeric vector or an n-by-d numeric matrix, where n is a positive scalar integer and d is the dimension of a single multivariate normal distribution. The rows of X correspond to observations (or points), and the columns correspond to variables (or coordinates). If X is a vector, then mvnpdf replicates it to match the leading dimension of mu or the trailing dimension of sigma. Data Types: single | double mu — Means of multivariate normal distributions vector of zeros (default) | numeric vector | numeric matrix Means of multivariate normal distributions, specified as a 1-by-d numeric vector or an n-by-d numeric matrix. • If mu is a vector, then mvnpdf replicates the vector to match the trailing dimension of sigma. • If mu is a matrix, then each row of mu is the mean vector of a single multivariate normal distribution. Data Types: single | double 32-3514

mvnpdf

sigma — Covariances of multivariate normal distributions identity matrix (default) | symmetric, positive definite matrix | numeric array Covariances of multivariate normal distributions, specified as a d-by-d symmetric, positive definite matrix or a d-by-d-by-n numeric array. • If sigma is a matrix, then mvnpdf replicates the matrix to match the number of rows in mu. • If sigma is an array, then each page of sigma, sigma(:,:,i), is the covariance matrix of a single multivariate normal distribution and, therefore, is a symmetric, positive definite matrix. If the covariance matrices are diagonal, containing variances along the diagonal and zero covariances off it, then you can also specify sigma as a 1-by-d vector or a 1-by-d-by-n array containing just the diagonal entries. Data Types: single | double

Output Arguments y — pdf values numeric vector pdf values, returned as an n-by-1 numeric vector, where n is one of the following: • Number of rows in X if X is a matrix • Number of times X is replicated if X is a vector If X is a matrix, mu is a matrix, and sigma is an array, then mvnpdf computes y(i) using X(i,:), mu(i,:), and sigma(:,:,i). Data Types: double

More About Multivariate Normal Distribution The multivariate normal distribution is a generalization of the univariate normal distribution to two or more variables. It has two parameters, a mean vector μ and a covariance matrix Σ, that are analogous to the mean and variance parameters of a univariate normal distribution. The diagonal elements of Σ contain the variances for each variable, and the off-diagonal elements of Σ contain the covariances between variables. The probability density function (pdf) of the d-dimensional multivariate normal distribution is y = f (x, μ, Σ) = 

1

1 exp − (x‐μ) Σ‐1(x‐μ)' 2 Σ (2π) d

where x and μ are 1-by-d vectors and Σ is a d-by-d symmetric, positive definite matrix. Only mvnrnd allows positive semi-definite Σ matrices, which can be singular. The pdf cannot have the same form when Σ is singular. The multivariate normal cumulative distribution function (cdf) evaluated at x is the probability that a random vector v, distributed as multivariate normal, lies within the semi-infinite rectangle with upper limits defined by x: 32-3515

32

Functions

Pr v(1) ≤ x(1), v(2) ≤ x(2), ..., v(d) ≤ x(d) . Although the multivariate normal cdf does not have a closed form, mvncdf can compute cdf values numerically.

Tips • In the one-dimensional case, sigma is the variance, not the standard deviation. For example, mvnpdf(1,0,4) is the same as normpdf(1,0,2), where 4 is the variance and 2 is the standard deviation.

References [1] Kotz, S., N. Balakrishnan, and N. L. Johnson. Continuous Multivariate Distributions: Volume 1: Models and Applications. 2nd ed. New York: John Wiley & Sons, Inc., 2000.

Extended Capabilities GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. This function fully supports GPU arrays. For more information, see “Run MATLAB Functions on a GPU” (Parallel Computing Toolbox).

See Also mvncdf | mvnrnd | normpdf Topics “Multivariate Normal Distribution” on page B-98 Introduced before R2006a

32-3516

mvregress

mvregress Multivariate linear regression

Syntax beta = mvregress(X,Y) beta = mvregress(X,Y,Name,Value) [beta,Sigma] = mvregress( ___ ) [beta,Sigma,E,CovB,logL] = mvregress( ___ )

Description beta = mvregress(X,Y) returns the estimated coefficients for a multivariate normal regression on page 32-3528 of the d-dimensional responses in Y on the design matrices in X. beta = mvregress(X,Y,Name,Value) returns the estimated coefficients using additional options specified by one or more name-value pair arguments. For example, you can specify the estimation algorithm, initial estimate values, or maximum number of iterations for the regression. [beta,Sigma] = mvregress( ___ ) also returns the estimated d-by-d variance-covariance matrix of Y, using any of the input arguments from the previous syntaxes. [beta,Sigma,E,CovB,logL] = mvregress( ___ ) also returns a matrix of residuals E, estimated variance-covariance matrix of the regression coefficients CovB, and the value of the log likelihood objective function after the last iteration logL.

Examples Multivariate Regression Model for Panel Data with Different Intercepts Fit a multivariate regression model to panel data, assuming different intercepts and common slopes. Load the sample data. load('flu')

The dataset array flu contains national CDC flu estimates, and nine separate regional estimates based on Google® query data. Extract the response and predictor data. Y = double(flu(:,2:end-1)); [n,d] = size(Y); x = flu.WtdILI;

The responses in Y are the nine regional flu estimates. Observations exist for every week over a oneyear period, so n = 52. The dimension of the responses corresponds to the regions, so d = 9. The predictors in x are the weekly national flu estimates. Plot the flu data, grouped by region. 32-3517

32

Functions

figure; regions = flu.Properties.VarNames(2:end-1); plot(x,Y,'x') legend(regions,'Location','NorthWest')

Fit the multivariate regression model yi j = α j + βxi j + ϵi j, where i = 1, …, n and j = 1, …, d, with between-region concurrent correlation COV(ϵi j, ϵi j) = σ j j. There are K = 10 regression coefficients to estimate: nine intercept terms and a common slope. The input argument X should be an n-element cell array of d -by- K design matrices. X = cell(n,1); for i = 1:n X{i} = [eye(d) repmat(x(i),d,1)]; end [beta,Sigma] = mvregress(X,Y);

beta contains estimates of the K -dimensional coefficient vector (α1, α2, …, α9, β)′. Sigma contains estimates of the d -by- d variance-covariance matrix (σi j)d × d, i, j = 1, …, d for the between-region concurrent correlations. Plot the fitted regression model. B = [beta(1:d)';repmat(beta(end),1,d)]; xx = linspace(.5,3.5)'; fits = [ones(size(xx)),xx]*B;

32-3518

mvregress

figure; h = plot(x,Y,'x',xx,fits,'-'); for i = 1:d set(h(d+i),'color',get(h(i),'color')); end legend(regions,'Location','NorthWest');

The plot shows that each regression line has a different intercept but the same slope. Upon visual inspection, some regression lines appear to fit the data better than others.

Multivariate Regression for Panel Data with Different Slopes Fit a multivariate regression model to panel data using least squares, assuming different intercepts and slopes. Load the sample data. load('flu');

The dataset array flu contains national CDC flu estimates, and nine separate regional estimates based on Google® queries. Extract the response and predictor data. 32-3519

32

Func