Data Mining Using SAS Applications

Data Mining Using SAS Applications

By (author) 

List price: US$99.95

Currently unavailable

Add to wishlist

AbeBooks may have this title (opens in new window).

Try AbeBooks

Description

Most books on data mining focus on principles and furnish few instructions on how to carry out a data mining project. "Data Mining Using SAS Applications" not only introduces the key concepts but also enables readers to understand and successfully apply data mining methods using powerful yet user-friendly SAS macro-call files. These methods stress the use of visualization to thoroughly study the structure of data and check the validity of statistical models fitted to data.Learn how to convert PC databases to SAS data. Discover sampling techniques to create training and validation samples. Understand frequency data analysis for categorical data. Explore supervised and unsupervised learning. Master exploratory graphical techniques. Acquire model validation techniques in regression and classification. The text furnishes 13 easy-to-use SAS data mining macros designed to work with the standard SAS modules. No additional modules or previous experience in SAS programming is required. The author shows how to perform complete predictive modeling, including data exploration, model fitting, assumption checks, validation, and scoring new data, on SAS datasets in less than ten minutes!show more

Product details

  • Hardback | 367 pages
  • 162 x 234 x 26mm | 698.54g
  • Taylor & Francis Ltd
  • Chapman & Hall/CRC
  • United States
  • English
  • 2003.
  • 101 black & white illustrations, 137 black & white tables
  • 1584883456
  • 9781584883456

Table of contents

DATA MINING - A GENTLE INTRODUCTION Data Mining: Why Now? Benefits of Data Mining Data Mining: Users Data Mining Tools Data Mining Steps Problems in Data Mining Process SAS Software: The Leader in Data Mining User-Friendly SAS Macros for Data Mining PREPARING DATA FOR DATA MINING Data Requirements in Data Mining Ideal Structures of Data for Data Mining Understanding the Measurement Scale of Variables Entire Database vs. Representative Sample Sampling for Data Mining SAS Applications Used in Data Preparation EXPLORATORY DATA ANALYSIS Exploring Continuous Variable Data Exploration: Categorical Variable SAS Macro Applications Used in Data Exploration UNSUPERVISED LEARNING METHODS Applications of Unsupervised Learning Methods Principal Component Analysis (PCA) Exploratory Factor Analysis (EFA) Disjoint Cluster Analysis (DCA) Bi-Plot Display of PCA, EFA, and DCA Results PCA And EFA Using SAS Macro FACTOR Disjoint Cluster Analysis Using SAS Macro DISJCLUS SUPERVISED LEARNING METHODS: PREDICTION Applications of Supervised Predictive Methods Multiple Linear Regression Modeling Binary Linear Regression Modeling Multiple Linear Regression Using SAS Macro REGDIAG Lift Chart Using SAS Macro LIFT Scoring New Regression Data Using the SAS Macro RSCORE Logistic Regression Using SAS Macro LOGISTIC Scoring New Logistic Regression Data Using the SAS Macro LSCORE Case Study 1: Modeling Multiple Linear Regression Case Study 2: Modeling Multiple Linear Regression with Categorical Variables Case Study 3: Modeling Binary Logistic Regression SUPERVISED LEARNING METHODS: CLASSIFICATION Discriminant Analysis Stepwise Discriminant Analysis Canonical Discriminant Analysis (CDA) Discriminant Function Analysis (DFA) Applications of Discriminant Analysis Classification Tree Based on CHAID Applications of CHAID Discriminant Analysis Using SAS Macro DISCRIM Decison Tree Using SAS Macro 'CHAID' Case Study1: CDA and Parametric DFA Case Study2: Non-Parametric DFA Case Study3: Classification Tree Using CHAID EMERGING TECHNOLOGIES IN DATA MINING Data Warehousing Artificial Neural Network Methods Market Basket Analysis SAS Software: The Leader in Data Mining APPENDIX: INSTRUCTION FOR USING THE SAS MACROS INDEX Each chapter also contains an introduction, a summary, references, list of figures, and suggested further reading. Short TOCshow more

Review quote

"The macros integrate nicely with SAS's output delivery system . [T]his is a book that could serve as an easy-to read introduction to some classical statistical techniques that are used in data mining, and, with the associated macros, provide an opportunity to see those techniques in action." - Journal of the American Statistical Association, June 2004, Vol. 99, No. 466 Read how Christopher Ross of the US Bureau of Land Management uses the SAS macros featured in this book: Report: Use of SAS macros in the analysis of population dynamics and changes in Curlleaf Mountain Mahogany in adjacent Sierran and Great Basin mountain ranges in the western United States. Mountain Mahogany is a very long-lived, broad leaf evergreen tree in the Rose family. Because of its importance to big game habitat, its disappearance in parts of its range over the past 50 years has been of great concern to land managers and sportsmen. I converted very large data sets (over 1,000,000 observations) derived from Geographic Information System analyses to SAS data sets using the EXCELSAS macro. I used UNIVAR SAS macro to conduct data exploration and identify problem observations and distributions for correction. Using the macro REGDIAG I examined the relation between changes in mahogany distribution over time (response) and topographic slope, aspect, and elevation and cross products and quadratic interactions of these (predictors) The logistic model was refined through examination of the variety of goodness of fit criteria and measures of association offered by the LOGISTIC SAS macro. The results showed strong correlations of tree distribution with geographic factors, and a trend in changes over time. Use of customodds ratios allowed prediction of changes in probability of finding trees at different combinations of variable values. I then appended a hypothetical data set with missing response variable to obtain predicted probabilities for mahogany at all combinations of slope, elevation, and aspect. These results have been used to prioritize areas for habitat restoration. I used the LOGISTIC macro (with field data) to demonstrate that bird damage by sapsuckers was strongly related to distance from nearest riparian area, but not to distance to conifer food sources or nest habitat. Another logistic regression analysis confirmed that bird damage was confined to specific age classes in the population. I also compared population age class parameters between the two mountain ranges to demonstrate that the desert range has a significantly different (bimodal) age class distribution from the normally distributed Sierra range population using the FREQ SAS macro. Use of these data mining SAS macros facilitated reliable conversion, examination, and analysis of the data, and selection of best statistical models despite the great size of the data sets. The results of this research have been used extensively by land management agencies and private landowners in order to maximize the effectiveness of habitat restoration efforts in these important game areas. -Christopher Ross, PhD. Reclamation Scientist/Natural Resource Specialist Bureau of Land Management, U.S. Department of Interior Reno, Nevada 89520 0006show more