Statistical Pattern Recognition

Statistical Pattern Recognition

Paperback

By (author) Andrew R. Webb, By (author) Keith Derek Copsey, By (author) Gavin Cawley

$59.19
List price $70.29
You save $11.10 15% off

Free delivery worldwide
Available
Dispatched in 3 business days
When will my order arrive?

Additional formats available

Format
Hardback $99.99
  • Publisher: Wiley-Blackwell (an imprint of John Wiley & Sons Ltd)
  • Format: Paperback | 666 pages
  • Dimensions: 166mm x 242mm x 38mm | 1,120g
  • Publication date: 20 December 2011
  • Publication City/Country: Chicester
  • ISBN 10: 0470682280
  • ISBN 13: 9780470682289
  • Edition: 3, Revised
  • Edition statement: 3rd Revised edition
  • Illustrations note: black & white illustrations, black & white tables, figures
  • Sales rank: 915,566

Product description

Statistical pattern recognition relates to the use of statistical techniques for analysing data measurements in order to extract information and make justified decisions. It is a very active area of study and research, which has seen many advances in recent years. Applications such as data mining, web searching, multimedia data retrieval, face recognition, and cursive handwriting recognition, all require robust and efficient pattern recognition techniques. This third edition provides an introduction to statistical pattern theory and techniques, with material drawn from a wide range of fields, including the areas of engineering, statistics, computer science and the social sciences. The book has been updated to cover new methods and applications, and includes a wide range of techniques such as Bayesian methods, neural networks, support vector machines, feature selection and feature reduction techniques.Technical descriptions and motivations are provided, and the techniques are illustrated using real examples. Statistical Pattern Recognition, 3rd Edition:* Provides a self-contained introduction to statistical pattern recognition.* Includes new material presenting the analysis of complex networks.* Introduces readers to methods for Bayesian density estimation.* Presents descriptions of new applications in biometrics, security, finance and condition monitoring.* Provides descriptions and guidance for implementing techniques, which will be invaluable to software engineers and developers seeking to develop real applications* Describes mathematically the range of statistical pattern recognition techniques.* Presents a variety of exercises including more extensive computer projects. The in-depth technical descriptions make the book suitable for senior undergraduate and graduate students in statistics, computer science and engineering. Statistical Pattern Recognition is also an excellent reference source for technical professionals. Chapters have been arranged to facilitate implementation of the techniques by software engineers and developers in non-statistical engineering fields. www.wiley.com/go/statistical-pattern-recognition

Other people who viewed this bought:

Showing items 1 to 10 of 10

Other books in this category

Showing items 1 to 11 of 11
Categories:

Review quote

"In the end I must add that this book is so appealing that I often found myself lost in the reading, pausing the overview of the manuscript in order to look more into some presented subject, and not being able to continue until I had finished seeing all about it." ( Zentralblatt MATH , 1 December 2012)

Back cover copy

Statistical pattern recognition relates to the use of statistical techniques for analysing data measurements in order to extract information and make justified decisions. It is a very active area of study and research, which has seen many advances in recent years. Applications such as data mining, web searching, multimedia data retrieval, face recognition, and cursive handwriting recognition, all require robust and efficient pattern recognition techniques.This third edition provides an introduction to statistical pattern theory and techniques, with material drawn from a wide range of fields, including the areas of engineering, statistics, computer science and the social sciences. The book has been updated to cover new methods and applications, and includes a wide range of techniques such as Bayesian methods, neural networks, support vector machines, feature selection and feature reduction techniques.Technical descriptions and motivations are provided, and the techniques are illustrated using real examples."Statistical Pattern Recognition," 3rd Edition: Provides a self-contained introduction to statistical pattern recognition.Includes new material presenting the analysis of complex networks.Introduces readers to methods for Bayesian density estimation.Presents descriptions of new applications in biometrics, security, finance and condition monitoring.Provides descriptions and guidance for implementing techniques, which will be invaluable to software engineers and developers seeking to develop real applicationsDescribes mathematically the range of statistical pattern recognition techniques.Presents a variety of exercises including more extensive computer projects.The in-depth technical descriptions make the book suitable for senior undergraduate and graduate students in statistics, computer science and engineering. "Statistical Pattern Recognition" is also an excellent reference source for technical professionals. Chapters have been arranged to facilitate implementation of the techniques by software engineers and developers in non-statistical engineering fields.www.wiley.com/go/statistical_pattern_recognition

Table of contents

Preface xix Notation xxiii 1 Introduction to Statistical Pattern Recognition 1 1.1 Statistical Pattern Recognition 1 1.1.1 Introduction 1 1.1.2 The Basic Model 2 1.2 Stages in a Pattern Recognition Problem 4 1.3 Issues 6 1.4 Approaches to Statistical Pattern Recognition 7 1.5 Elementary Decision Theory 8 1.5.1 Bayes' Decision Rule for Minimum Error 8 1.5.2 Bayes' Decision Rule for Minimum Error - Reject Option 12 1.5.3 Bayes' Decision Rule for Minimum Risk 13 1.5.4 Bayes' Decision Rule for Minimum Risk - Reject Option 15 1.5.5 Neyman-Pearson Decision Rule 15 1.5.6 Minimax Criterion 18 1.5.7 Discussion 19 1.6 Discriminant Functions 20 1.6.1 Introduction 20 1.6.2 Linear Discriminant Functions 21 1.6.3 Piecewise Linear Discriminant Functions 23 1.6.4 Generalised Linear Discriminant Function 24 1.6.5 Summary 26 1.7 Multiple Regression 27 1.8 Outline of Book 29 1.9 Notes and References 29 Exercises 31 2 Density Estimation - Parametric 33 2.1 Introduction 33 2.2 Estimating the Parameters of the Distributions 34 2.2.1 Estimative Approach 34 2.2.2 Predictive Approach 35 2.3 The Gaussian Classifier 35 2.3.1 Specification 35 2.3.2 Derivation of the Gaussian Classifier Plug-In Estimates 37 2.3.3 Example Application Study 39 2.4 Dealing with Singularities in the Gaussian Classifier 40 2.4.1 Introduction 40 2.4.2 Naive Bayes 40 2.4.3 Projection onto a Subspace 41 2.4.4 Linear Discriminant Function 41 2.4.5 Regularised Discriminant Analysis 42 2.4.6 Example Application Study 44 2.4.7 Further Developments 45 2.4.8 Summary 46 2.5 Finite Mixture Models 46 2.5.1 Introduction 46 2.5.2 Mixture Models for Discrimination 48 2.5.3 Parameter Estimation for Normal Mixture Models 49 2.5.4 Normal Mixture Model Covariance Matrix Constraints 51 2.5.5 How Many Components? 52 2.5.6 Maximum Likelihood Estimation via EM 55 2.5.7 Example Application Study 60 2.5.8 Further Developments 62 2.5.9 Summary 63 2.6 Application Studies 63 2.7 Summary and Discussion 66 2.8 Recommendations 66 2.9 Notes and References 67 Exercises 67 3 Density Estimation - Bayesian 70 3.1 Introduction 70 3.1.1 Basics 72 3.1.2 Recursive Calculation 72 3.1.3 Proportionality 73 3.2 Analytic Solutions 73 3.2.1 Conjugate Priors 73 3.2.2 Estimating the Mean of a Normal Distribution with Known Variance 75 3.2.3 Estimating the Mean and the Covariance Matrix of a Multivariate Normal Distribution 79 3.2.4 Unknown Prior Class Probabilities 85 3.2.5 Summary 87 3.3 Bayesian Sampling Schemes 87 3.3.1 Introduction 87 3.3.2 Summarisation 87 3.3.3 Sampling Version of the Bayesian Classifier 89 3.3.4 Rejection Sampling 89 3.3.5 Ratio of Uniforms 90 3.3.6 Importance Sampling 92 3.4 Markov Chain Monte Carlo Methods 95 3.4.1 Introduction 95 3.4.2 The Gibbs Sampler 95 3.4.3 Metropolis-Hastings Algorithm 103 3.4.4 Data Augmentation 107 3.4.5 Reversible Jump Markov Chain Monte Carlo 108 3.4.6 Slice Sampling 109 3.4.7 MCMC Example - Estimation of Noisy Sinusoids 111 3.4.8 Summary 115 3.4.9 Notes and References 116 3.5 Bayesian Approaches to Discrimination 116 3.5.1 Labelled Training Data 116 3.5.2 Unlabelled Training Data 117 3.6 Sequential Monte Carlo Samplers 119 3.6.1 Introduction 119 3.6.2 Basic Methodology 121 3.6.3 Summary 125 3.7 Variational Bayes 126 3.7.1 Introduction 126 3.7.2 Description 126 3.7.3 Factorised Variational Approximation 129 3.7.4 Simple Example 131 3.7.5 Use of the Procedure for Model Selection 135 3.7.6 Further Developments and Applications 136 3.7.7 Summary 137 3.8 Approximate Bayesian Computation 137 3.8.1 Introduction 137 3.8.2 ABC Rejection Sampling 138 3.8.3 ABC MCMC Sampling 140 3.8.4 ABC Population Monte Carlo Sampling 141 3.8.5 Model Selection 142 3.8.6 Summary 143 3.9 Example Application Study 144 3.10 Application Studies 145 3.11 Summary and Discussion 146 3.12 Recommendations 147 3.13 Notes and References 147 Exercises 148 4 Density Estimation - Nonparametric 150 4.1 Introduction 150 4.1.1 Basic Properties of Density Estimators 150 4.2 k -Nearest-Neighbour Method 152 4.2.1 k -Nearest-Neighbour Classifier 152 4.2.2 Derivation 154 4.2.3 Choice of Distance Metric 157 4.2.4 Properties of the Nearest-Neighbour Rule 159 4.2.5 Linear Approximating and Eliminating Search Algorithm 159 4.2.6 Branch and Bound Search Algorithms: kd-Trees 163 4.2.7 Branch and Bound Search Algorithms: Ball-Trees 170 4.2.8 Editing Techniques 174 4.2.9 Example Application Study 177 4.2.10 Further Developments 178 4.2.11 Summary 179 4.3 Histogram Method 180 4.3.1 Data Adaptive Histograms 181 4.3.2 Independence Assumption (Naive Bayes) 181 4.3.3 Lancaster Models 182 4.3.4 Maximum Weight Dependence Trees 183 4.3.5 Bayesian Networks 186 4.3.6 Example Application Study - Naive Bayes Text Classification 190 4.3.7 Summary 193 4.4 Kernel Methods 194 4.4.1 Biasedness 197 4.4.2 Multivariate Extension 198 4.4.3 Choice of Smoothing Parameter 199 4.4.4 Choice of Kernel 201 4.4.5 Example Application Study 202 4.4.6 Further Developments 203 4.4.7 Summary 203 4.5 Expansion by Basis Functions 204 4.6 Copulas 207 4.6.1 Introduction 207 4.6.2 Mathematical Basis 207 4.6.3 Copula Functions 208 4.6.4 Estimating Copula Probability Density Functions 209 4.6.5 Simple Example 211 4.6.6 Summary 212 4.7 Application Studies 213 4.7.1 Comparative Studies 216 4.8 Summary and Discussion 216 4.9 Recommendations 217 4.10 Notes and References 217 Exercises 218 5 Linear Discriminant Analysis 221 5.1 Introduction 221 5.2 Two-Class Algorithms 222 5.2.1 General Ideas 222 5.2.2 Perceptron Criterion 223 5.2.3 Fisher's Criterion 227 5.2.4 Least Mean-Squared-Error Procedures 228 5.2.5 Further Developments 235 5.2.6 Summary 235 5.3 Multiclass Algorithms 236 5.3.1 General Ideas 236 5.3.2 Error-Correction Procedure 237 5.3.3 Fisher's Criterion - Linear Discriminant Analysis 238 5.3.4 Least Mean-Squared-Error Procedures 241 5.3.5 Regularisation 246 5.3.6 Example Application Study 246 5.3.7 Further Developments 247 5.3.8 Summary 248 5.4 Support Vector Machines 249 5.4.1 Introduction 249 5.4.2 Linearly Separable Two-Class Data 249 5.4.3 Linearly Nonseparable Two-Class Data 253 5.4.4 Multiclass SVMs 256 5.4.5 SVMs for Regression 257 5.4.6 Implementation 259 5.4.7 Example Application Study 262 5.4.8 Summary 263 5.5 Logistic Discrimination 263 5.5.1 Two-Class Case 263 5.5.2 Maximum Likelihood Estimation 264 5.5.3 Multiclass Logistic Discrimination 266 5.5.4 Example Application Study 267 5.5.5 Further Developments 267 5.5.6 Summary 268 5.6 Application Studies 268 5.7 Summary and Discussion 268 5.8 Recommendations 269 5.9 Notes and References 270 Exercises 270 6 Nonlinear Discriminant Analysis - Kernel and Projection Methods 274 6.1 Introduction 274 6.2 Radial Basis Functions 276 6.2.1 Introduction 276 6.2.2 Specifying the Model 278 6.2.3 Specifying the Functional Form 278 6.2.4 The Positions of the Centres 279 6.2.5 Smoothing Parameters 281 6.2.6 Calculation of the Weights 282 6.2.7 Model Order Selection 284 6.2.8 Simple RBF 285 6.2.9 Motivation 286 6.2.10 RBF Properties 288 6.2.11 Example Application Study 288 6.2.12 Further Developments 289 6.2.13 Summary 290 6.3 Nonlinear Support Vector Machines 291 6.3.1 Introduction 291 6.3.2 Binary Classification 291 6.3.3 Types of Kernel 292 6.3.4 Model Selection 293 6.3.5 Multiclass SVMs 294 6.3.6 Probability Estimates 294 6.3.7 Nonlinear Regression 296 6.3.8 Example Application Study 296 6.3.9 Further Developments 297 6.3.10 Summary 298 6.4 The Multilayer Perceptron 298 6.4.1 Introduction 298 6.4.2 Specifying the MLP Structure 299 6.4.3 Determining the MLP Weights 300 6.4.4 Modelling Capacity of the MLP 307 6.4.5 Logistic Classification 307 6.4.6 Example Application Study 310 6.4.7 Bayesian MLP Networks 311 6.4.8 Projection Pursuit 313 6.4.9 Summary 313 6.5 Application Studies 314 6.6 Summary and Discussion 316 6.7 Recommendations 317 6.8 Notes and References 318 Exercises 318 7 Rule and Decision Tree Induction 322 7.1 Introduction 322 7.2 Decision Trees 323 7.2.1 Introduction 323 7.2.2 Decision Tree Construction 326 7.2.3 Selection of the Splitting Rule 327 7.2.4 Terminating the Splitting Procedure 330 7.2.5 Assigning Class Labels to Terminal Nodes 332 7.2.6 Decision Tree Pruning - Worked Example 332 7.2.7 Decision Tree Construction Methods 337 7.2.8 Other Issues 339 7.2.9 Example Application Study 340 7.2.10 Further Developments 341 7.2.11 Summary 342 7.3 Rule Induction 342 7.3.1 Introduction 342 7.3.2 Generating Rules from a Decision Tree 345 7.3.3 Rule Induction Using a Sequential Covering Algorithm 345 7.3.4 Example Application Study 350 7.3.5 Further Developments 351 7.3.6 Summary 351 7.4 Multivariate Adaptive Regression Splines 351 7.4.1 Introduction 351 7.4.2 Recursive Partitioning Model 351 7.4.3 Example Application Study 355 7.4.4 Further Developments 355 7.4.5 Summary 356 7.5 Application Studies 356 7.6 Summary and Discussion 358 7.7 Recommendations 358 7.8 Notes and References 359 Exercises 359 8 Ensemble Methods 361 8.1 Introduction 361 8.2 Characterising a Classifier Combination Scheme 362 8.2.1 Feature Space 363 8.2.2 Level 366 8.2.3 Degree of Training 368 8.2.4 Form of Component Classifiers 368 8.2.5 Structure 369 8.2.6 Optimisation 369 8.3 Data Fusion 370 8.3.1 Architectures 370 8.3.2 Bayesian Approaches 371 8.3.3 Neyman-Pearson Formulation 373 8.3.4 Trainable Rules 374 8.3.5 Fixed Rules 375 8.4 Classifier Combination Methods 376 8.4.1 Product Rule 376 8.4.2 Sum Rule 377 8.4.3 Min, Max and Median Combiners 378 8.4.4 Majority Vote 379 8.4.5 Borda Count 379 8.4.6 Combiners Trained on Class Predictions 380 8.4.7 Stacked Generalisation 382 8.4.8 Mixture of Experts 382 8.4.9 Bagging 385 8.4.10 Boosting 387 8.4.11 Random Forests 389 8.4.12 Model Averaging 390 8.4.13 Summary of Methods 396 8.4.14 Example Application Study 398 8.4.15 Further Developments 399 8.5 Application Studies 399 8.6 Summary and Discussion 400 8.7 Recommendations 401 8.8 Notes and References 401 Exercises 402 9 Performance Assessment 404 9.1 Introduction 404 9.2 Performance Assessment 405 9.2.1 Performance Measures 405 9.2.2 Discriminability 406 9.2.3 Reliability 413 9.2.4 ROC Curves for Performance Assessment 415 9.2.5 Population and Sensor Drift 419 9.2.6 Example Application Study 421 9.2.7 Further Developments 422 9.2.8 Summary 423 9.3 Comparing Classifier Performance 424 9.3.1 Which Technique is Best? 424 9.3.2 Statistical Tests 425 9.3.3 Comparing Rules When Misclassification Costs are Uncertain 426 9.3.4 Example Application Study 428 9.3.5 Further Developments 429 9.3.6 Summary 429 9.4 Application Studies 429 9.5 Summary and Discussion 430 9.6 Recommendations 430 9.7 Notes and References 430 Exercises 431 10 Feature Selection and Extraction 433 10.1 Introduction 433 10.2 Feature Selection 435 10.2.1 Introduction 435 10.2.2 Characterisation of Feature Selection Approaches 439 10.2.3 Evaluation Measures 440 10.2.4 Search Algorithms for Feature Subset Selection 449 10.2.5 Complete Search - Branch and Bound 450 10.2.6 Sequential Search 454 10.2.7 Random Search 458 10.2.8 Markov Blanket 459 10.2.9 Stability of Feature Selection 460 10.2.10 Example Application Study 462 10.2.11 Further Developments 462 10.2.12 Summary 463 10.3 Linear Feature Extraction 463 10.3.1 Principal Components Analysis 464 10.3.2 Karhunen-Lo'eve Transformation 475 10.3.3 Example Application Study 481 10.3.4 Further Developments 482 10.3.5 Summary 483 10.4 Multidimensional Scaling 484 10.4.1 Classical Scaling 484 10.4.2 Metric MDS 486 10.4.3 Ordinal Scaling 487 10.4.4 Algorithms 490 10.4.5 MDS for Feature Extraction 491 10.4.6 Example Application Study 492 10.4.7 Further Developments 493 10.4.8 Summary 493 10.5 Application Studies 493 10.6 Summary and Discussion 495 10.7 Recommendations 495 10.8 Notes and References 496 Exercises 497 11 Clustering 501 11.1 Introduction 501 11.2 Hierarchical Methods 502 11.2.1 Single-Link Method 503 11.2.2 Complete-Link Method 506 11.2.3 Sum-of-Squares Method 507 11.2.4 General Agglomerative Algorithm 508 11.2.5 Properties of a Hierarchical Classification 508 11.2.6 Example Application Study 509 11.2.7 Summary 509 11.3 Quick Partitions 510 11.4 Mixture Models 511 11.4.1 Model Description 511 11.4.2 Example Application Study 512 11.5 Sum-of-Squares Methods 513 11.5.1 Clustering Criteria 514 11.5.2 Clustering Algorithms 515 11.5.3 Vector Quantisation 520 11.5.4 Example Application Study 530 11.5.5 Further Developments 530 11.5.6 Summary 531 11.6 Spectral Clustering 531 11.6.1 Elementary Graph Theory 531 11.6.2 Similarity Matrices 534 11.6.3 Application to Clustering 534 11.6.4 Spectral Clustering Algorithm 535 11.6.5 Forms of Graph Laplacian 535 11.6.6 Example Application Study 536 11.6.7 Further Developments 538 11.6.8 Summary 538 11.7 Cluster Validity 538 11.7.1 Introduction 538 11.7.2 Statistical Tests 539 11.7.3 Absence of Class Structure 540 11.7.4 Validity of Individual Clusters 541 11.7.5 Hierarchical Clustering 542 11.7.6 Validation of Individual Clusterings 542 11.7.7 Partitions 543 11.7.8 Relative Criteria 543 11.7.9 Choosing the Number of Clusters 545 11.8 Application Studies 546 11.9 Summary and Discussion 549 11.10 Recommendations 551 11.11 Notes and References 552 Exercises 553 12 Complex Networks 555 12.1 Introduction 555 12.1.1 Characteristics 557 12.1.2 Properties 557 12.1.3 Questions to Address 559 12.1.4 Descriptive Features 560 12.1.5 Outline 560 12.2 Mathematics of Networks 561 12.2.1 Graph Matrices 561 12.2.2 Connectivity 562 12.2.3 Distance Measures 562 12.2.4 Weighted Networks 563 12.2.5 Centrality Measures 563 12.2.6 Random Graphs 564 12.3 Community Detection 565 12.3.1 Clustering Methods 565 12.3.2 Girvan-Newman Algorithm 568 12.3.3 Modularity Approaches 570 12.3.4 Local Modularity 571 12.3.5 Clique Percolation 573 12.3.6 Example Application Study 574 12.3.7 Further Developments 575 12.3.8 Summary 575 12.4 Link Prediction 575 12.4.1 Approaches to Link Prediction 576 12.4.2 Example Application Study 578 12.4.3 Further Developments 578 12.5 Application Studies 579 12.6 Summary and Discussion 579 12.7 Recommendations 580 12.8 Notes and References 580 Exercises 580 13 Additional Topics 581 13.1 Model Selection 581 13.1.1 Separate Training and Test Sets 582 13.1.2 Cross-Validation 582 13.1.3 The Bayesian Viewpoint 583 13.1.4 Akaike's Information Criterion 583 13.1.5 Minimum Description Length 584 13.2 Missing Data 585 13.3 Outlier Detection and Robust Procedures 586 13.4 Mixed Continuous and Discrete Variables 587 13.5 Structural Risk Minimisation and the Vapnik-Chervonenkis Dimension 588 13.5.1 Bounds on the Expected Risk 588 13.5.2 The VC Dimension 589 References 591 Index 637