Data Warehousing with Informix : Best Practices
A Comprehensive Look at an Exploding Technology Data Warehousing-From Planning Through Performance-By Top Experts in the Field For a large and growing list of industries, data warehouses are rapidly becoming a 'must-have' strategic business tool. Yet data warehouses are made up of a complex mix of products and technologies, making it tough to keep up with all the latest strategies, theories, and approaches to design and implementation. Data Warehousing with Informix: Best Practices was created to provide a comprehensive overview of the latest information, with a special focus on Informix-based systems. This single volume presents the most innovative ideas on data warehousing, from the most experienced professionals. The book is organized around four main themes: Warehouse Design and Implementation Hardware, Storage, and Backup The Decision-Support Community Future Trends Whether a data warehousing novice or veteran, you will find a wealth of information and ideas in these essays from best-selling authors and well-known authorities in the technology. The contributors represent a variety of viewpoints, from theorists, to nuts-and-bolts programmers, to project/organizational managers.
- Paperback | 352 pages
- 175.26 x 231.14 x 25.4mm | 544.31g
- 10 Mar 1998
- Pearson Education (US)
- Prentice Hall
- Upper Saddle River, United States
Table of contents
Part I. The Decision Support Community . 1. The DSS Community: Tourists, Explorers, and Farmers by William H. Inmon. Introduction. The DSS Community. Tourists. Explorers. Farmers. Development Life Cycles. Database Design. Different Data Warehouses. Infrastructure. Tools. The Cycle of DSS Users. The Organization Chart. Summary. About the Author. About Pine Cone Systems.2. Managing the Data Warehouse: The Advent of the Data Warehouse Administrator by William H. Inmon. Managing the Data Warehouse Environment. Operational and Informational Differences. The Role of the DWA. Managing Volumes of Data. Keeping Costs Down. Creating and Managing the Technical Infrastructure. Fostering an Organizational Environment. Creating a Proper Architecture. Summary. About the Author. About Pine Cone Systems.Part II. Design and Implementation for Decision Support. 3. Using a Rational Approach to Build Your Data Warehouse by Dale Mietla and Marvin Miller. Introduction. Key Components. Data Warehouse Products. Data Warehouse Services. Data Warehousing Process. What Is a Process? Project Startup. Scope and Purpose Confirmation. Knowledge Analysis. Knowledge Requirements Definition. Data Source and Quality Analysis. Knowledge Use Analysis. Solution Design. Solution Architecture Design. Testing Strategy and Design. Procedure Design. Data Warehouse Environment Installation. Solution Implementation. Warehouse Solution Development. Warehouse Loading and Testing. Data Warehouse Solution Deployment. Project Wrap-Up. Warehouse Solution Review. Iterative Application of the Process. Concurrent Engineering Shortens Cycle Time. Keys to Project Success. Conclusion. About the Authors. About NewTHINK, Inc. About Digital Equipment Corporation.4. Starting the Data Warehouse from a Data Model by Larry Heinrich. What Are Template Data Models? Why Use Template Data Models? The Cost of Entry. What Types of Industries Are Supported? Everything Begins with the Enterprise Data Model. Business Areas Are the Building Blocks for the Data Warehouse. Using Template Data Models to Build the Data Warehouse and Data Marts. Comparing the Costs. Guidelines for Using Template Data Models. The Right Tools Make the Job Easier. Summary. About the Author. About Allied Data Resource Management.5. Integrating Data to Populate the Data Warehouse by Patricia Klauer and Vidette Poe. Introduction. Understanding the Business Purpose of the Data Warehouse. Defining Data versus Information. The Data Integration Process. Data Architecture. Data Architecture Tip. Metadata Data Integration Phases. Data Sourcing. Data Consolidation Steps. Analyze Source Data Documentation. Source Data Documentation Tip. Flatten Out the Data Into Logical Records. Logical Record Tip. Perform Domain Analysis. Representative Data Tip. Determine the Primary Keys. Data Analyst Skill Set Tip. Identify Foreign Keys. Synonyming Tip. Data Analysis Needed for Data Consolidation. Identifying Overlapping Data: Subject Area Analysis. Identifying Overlapping Data: Synonyms, Homonyms, and Analogs. Analyzing Data to Integrate It into an Existing Data Warehouse. Data Analysis Tip. Understanding Business Rules and Nuances of Meaning. Business Rule Tip. Data-Driver Analysis. Data-Driven Analysis Tip. Data Conversion. Map Source File Attributes to the Data Warehouse's Physical Data Structure. Map Source Attributes' Allowable. Values to Target Values. Specify Default Values. Conversion Specifications. Data Population. Write Conversion Programs. Test the Conversion. Determine Exception Processing. Collect Statistics. Conduct Quality Assurance. Perform Stress Test. Summary. About the Authors. About Manage Data, Inc. About In ER-G Solutions, Ltd.6. Designing an OLAP Data Mart on Relational Databases by Jonathan Kraft. Introduction. Classic Entity-Relationship Modeling and Decision Support. Proprietary Multidimensional Databases. Dimensional Modeling. Dimension Elements. Dimension Attributes. The Star Schema. Denormalization of Dimensions. Advantages of Star Schema in Dimensional Modeling. Aggregation. Aggregating the Multidimensional Data Warehouse. How Much to Aggregate? Choosing the Right Aggregates. Sample Aggregation Sizing. Simulation Procedures. Simulation Results. Incremental versus Full Aggregation. Extending Dimensional Modeling. Normalizing the Dimensions. The Snowflake Schema. Disadvantages of Normalization. Partial Normalization. Conclusion. About the Author.7. A Data Mining Tutorial by Alice Landy. Introduction. Scalable Data Mining and Knowledge Discovery. Scalability in Data Mining. Parallel Processing. Out of Core Processing. Working with Databases. Using All the Data. Knowledge Discovery and Deployment. Using Data Mining Software. Define the Business Problem. Costs and Benefits. The Time/Perfection Tradeoff. Locate the Data. Organize the Data. Dimensions. Prepare the Data. Cleaning Data. Organizing Data. Create a Historical Data Set. Create a Model. Data Mining Models. Multiple Learning Tools. Build Models. Pruning a Tree and Testing the Subtrees. Evaluating a Subtree. Predicting the Response Variable. Optimizing Trees. Neural Networks. Creating a Neural Network. Training the Network. Network Weights. Training Algorithms. Optimizing Neural Networks. Match Models. Optimizing Match Models. Use the Model. Analyze the Results. Margin and ROI. Update Your Model. Measuring Success. About the Author. About Thinking Machines Corporation.8. Sampling: The Latest Breakthrough in Decision-Support Technology by Jonathan Kraft. Introduction: Increasing the Demand for Information. Business Trend Discovery. Drilling to Detail. Sampling: A Usability and Scalability Breakthrough. Aggregation and Its Uses. Configuration 1: Exhausive Aggregation. Configuration 2: Sparse Aggregation. Sampling: A Scalability and Maintenance Windfall. Accuracy and Confidence: A Complex Paradigm Made Simple. Trusting Samples. Conclusion. About the Author.Part III. Hardware, storage, and Backup Issues in a Data Warehouse. 9. Optimal Architecture for Enterprise-Class Data Warehousing by Steve Deck. The Evolution to High-End Data Warehouses. Data Warehouse Overview. Definition. Planning and Implementation. Support and Management. Clustered SMP versus MPP in Data Warehousing. Massively Parallel Processors. Clustered Symmetric Multiprocessors. Informix and HP in Enterprise Data Warehousing. Complementary Architectures. Informix Dynamic Server with the Extended Parallel. Option and Advanced Decision Support Option. DSA: Parallel Origin. HP 9000 Enterprise Parallel Server. HP-UX Operating System/PA-RISC Architecture. Hewlett-Packard Enterprise Parallel Server. Informix Dynamic Server with the Extended Parallel Option and Advanced Decision Support Option, and the HP EPS. Clustered SMP Environment. Scalable RDBMS. Open Systems Hardware. Real-World Performance. Data Partitioning. Control Partitioning. Query Execution. Intraserver Communication. High Availability. Dual-Ported Disks and Other Disk Solutions. Early Benchmark Results. Conclusion. About the Author.10. Data Warehousing and the Value of 64-Bit Computing by Marvin Miller. Introduction. Business Work Flow. Work Flow for Operational Data Systems (OLTP). Work Flow for OLTP and DSS. Data Warehousing's Hidden Relationship. Examples. Market-Basket Analysis. Buying Patterns. The Relation of Samples to Prescriptions. The Value of Data Relationships. Data Warehouse Examples. Linking the ODS and DSS. Accessing the Operational Data Store. Which Data is Needed? A Different Data Design for Ad Hoc Queries. Future Needs. Explosive Growth. Technical Trends. The Value of 64-Bit Computing. Large Data Set Capacity. Application Performance. Increased Performance As Complexity Grows. Conclusion. About the Author. About Digital Equipment Corporation.11. The Use of Storage Subsystems in Data Warehousing by Rob Nicholson and Nancy Ann Coon. Introduction. Components of a Data Warehouse. Hardware and Software Overview. Data Placement. The Elements of Response Time. Bandwide and Response Time. Disk-Access Patterns in Data Warehousing Applications. Database Load. Complex Query. Data Mining. Available Storage Technologies. Parallel SCSI. Weaknesses in Parallel SCSI. Serial Storage Architecture (SSA). SSA Nodes and Links. How Data Travels Over an SSA Link. Frame Multiplexing. Spatial Reuse. Redundant Paths. Fairness. Cut-Through Routing. Future SSA Developments. Fiber Channel and Fiber Channel Arbitrated Loop. Basics of FC-AL. RAID. Centralized Storage Controller or Independent Disks? Cache. Read Cache. Write Cache. Data Warehousing at MCI: A Case Study. Summary. About the Authors. About IBM Corporation.12. The Backup and Recovery of Very Large Databases by Daniel A. Wood. Introduction. What Are Database Backup and Restore? Database Backup. Database Restore. Informix Backup and Restore: ON-Bar. High Performance. High Availability. Reliability. Manageability. Flexibility. ON-Bar Architecture. The onbar Program. The X/Open Backup Services API. The Storage Manager. ON-Bar Catalog Tables. Emergency Boot File. How ON-Bar Works. ON-Bar: Technical Overview. ON-Bar Key Features. On-Line Backup and Restore. Partition-Level Backup and Restore. Incremental Backup. Point-In-Time Recovery. Invoking ON-Bar. Command-Line Interface. Informix Enterprise Command Center. Third-Party Storage Manager GUI. Storage Managers. Informix Storage Manager. Third-Party Storage Managers. IBM ADSTAR Distributed Storage Manager. Hewlett-Packard (HP) OpenView OmniBack II. Legato NetWorker. EMC Data Manager. Conclusion. About the Author.13. A Manager's Guide to Informix Database Protection by John Maxwell. Introduction. Requirements for Database Backup and Restore. Solution Architecture. A Cooperative Solution. Data Flow. Consistent Reliable Data Protection. High-Performance Protection. Automated, Unattended Operations. Simple, Enterprisewide Backup Administration. ON-Bar/NetWorker Operation Examples. Ad Hoc Database Backup Example. Code Examples Using ON-Bar Commands. Scheduled Database Backup Example. Database Restore Examples. Physical Restore Example. Logical Restore Example. Combined Restore Example. Point-in-Time Restore Example. Summary. About the Author. About Legato Systems.14. Determining Available dbspaces by Gary D. Cherneski. Introduction. Script Description. Note. Script That Determines Available dbspaces. Sample Output. Conclusion. About the Author.Part IV. The Future of Data Warehousing. 15. A Platform for the Universal Warehouse by Malcolm Colton. Introduction. Basic DBMS Aspects of Data Warehousing. Data Structures. Algorithms. Visualization. Data and System Management. Integrating Operational and Legacy Data. Data Warehousing in Practice. Scrubbing. Growth. Data Liquidity. Data Marts. DSA Support for Data Warehousing. Multithreading. Asynchronous I/O and Shared Data Cache. Parallel Operations. Informix Dynamic Server with the Universal Data Option for Data Warehousing. Complex Datatypes. Casting. Extending the Client. Complex Functions. User-Defined Aggregates. Appropriate Indexing. Integrating the Legacy. DataBlade Module Solution Components. Informix Dynamic Server with the Universal Data Option in Data Warehousing. The Universal Server as a Data Repository. PerformanceDataBlade Modules for Data Warehousing. Documents: Adobe, Excalibur, PLS, and Verity. Maps: Andyne, ESRI, Informix Geodetic, and MapInfo. Statistics: Fame, StatSci, and TimeSeries. Data Scrubbing: Ecologic and Electronic Digital Documents. Data Mining: Angoss and Neovista. Informix Dynamic Server with the Universal Data Option as Middleware. Data Security. Administration. Web-Enabled Warehouse. Visualization: Brio, Business Objects, Cognos, and Formida. Summary. About the Author.16. Building Complex Decision-Support Models Using a Universal Warehouse by Jacques Joy. Introduction.Database Background. What Is a Database Module? Features of Informix Dynamic Server with the Universal Data Option. Universal Data Warehouse: The Problem. First Try. The Integrated Data Server. Object-Oriented Design and Implementation. Using Strong Typing. Type-Based Integrity Check. What Makes the Integrated Data Server Universal? Conclusion. About Sabre Technology Solutions. About the Author. Index.